Our methodology of collection is to collect everything from everywhere (not all kinds though, only the standardized ones). I would love to see another version of stats collected and released by someone to compare with our data. Preparing and maintaining stats is boring and costly and mostly thankless, only used by media to laugh at us, and will probably stay an after-thought for a long while.
I'm curious if you store all the notes/events with their content or you just keep track of their ID for stat purposes. If the former, I'm curious of your storage needs since it would mean storing every note ever published. Do you crawl a fixed set of relays? Does it update constantly in real time automatically? Or do you run it once every X amount of time?
We collect all events from all relays for search (only standard kinds), so yes we store everything ever published, that's at an order of 1Tb of unique events atm. Stats only show the last 2/6 months, but they have to process everything to be correct. Stats are real time, it's a process that keeps all numbers in RAM and handles incoming events as they come.
Very impressive. Why store the note content though?