i was talking about data transferred between client and cache api per each refresh, if i dropped the fast checks, and was talking about for my case with just less than 300 games. for that situation, the estimate holds up, but yeah would not scale.
still not sure when if youre talking about storage or data transfer, but lets make some estimates
for you, 500 games, 4.3mb db, for me, 280 games, 3.6mb db. each game looks to be about 10kb on average then. lets consider worst case scenario, make it 30kb of data per game.
currently, latest thread IDs are in the 200.000s, that is over operating since 2016 (or atleast thats what i suspect given that thread id 1 was posted in 2016), so 8 years of *all* threads, not just games, is 200 hundred thousand threads. worst case scenario, and also being optimistic that both the forum and this checker will live long enough to reach that point, lets consider a maximum of 1 million threads that the forum will ever contain. (again keep in mind that of those only a tiny fraction are games, and many more are also deleted or privated).
30kb per thread, maximum count of threads that will ever exist 1 million, thats 30 gigabytes of thread data. again, considering that a very small fraction of that are games and some are deleted, lets say maybe 5 gigabytes will ever need to be stored on disk by my api. base plan with my hosting provider is 70gb nvme last i checked, with the 5$ a month vps. so storage is not an issue.
now for requesting all the threads, currently 200k total threads, of which 20k are in games forum (just noticed while typing that you can see this on the homepage). again, 20k over 8 years of this forum running, so worst case scenario the most it will reach is 100k game threads. from what i tested, the rate limit seems to be about 1-2 requests per second, not exactly clear, but i tried fetching thread ids 1-200 using a ratelimiter for asyncio and it could handle it with 2 requests per second sustainedly, with up to 6 requests back to back when starting to scrape. but lets assume worst case and say 0.5 requests per second.
so considering max 100k game threads, and 0.5 requests per second with no breaks, that would be 2.3 days to scrape more game threads than the forum will ever have. way more than we will ever get as traffic, and with slower rate limiting than what the forum allows.
lets consider a more realistic scenario. theres 20k game threads, lets say that between the whole f95checker userbase theres atleast 1 user for each thread (which there probably isnt). with 1 request per second, that can all be scraped within 5.5 hours.
*but* the whole point is making this a cached api, so we dont constantly reach out to the forum. originally i had mentioned having a cache timeout of 6 or 12 hours, and in hindsight that would be too low. but lets consider the same timeout as the checker, 7 days.
when a user refreshes the games, the cache api will check when each thread id was last cached. if it was last cached less than 7 days ago, it returns the cached data, nothing hits f95zone. and its gonna be quite unlikely that everyone stops refreshing all the games for 7 days, then everyone refreshes at the same time, and the api has to reach out to f95zone for 20k games at the same time, thus getting in queue for 5 hours. what is more likely to happen is that most requests will be cached, the most popular games will be requested from the cache periodically, and every once in a while it will expire the 7 days of cache and reach out to f95zone. with 20k games and 7 days cache timeout, that means an average of 2 requests per minute to f95zone.
and again, thats assuming that there even is one person using f95checker for each of the 20k game threads that exist, and that they are all refreshed within the same 7 day window.
as for my server being able to keep up, that could be a concern.
worst case lets say the average f95checker user had 1000 games, again worst case 30kb per game, thats 30 megabytes for a full check. now, since the api is handling the periodic full checks, i would remove the option to run full rechecks from the checker interface. instead, refreshing would do a fast check, but to another api on my server, which just returns the last time that each thread id has changed data, and only if the checker sees that there is new data the full info is fetched. lets say this happens for 100 of the 1000 games each week (superficial detail changes like rating count and similar will be ignored as a "change"), means 100 (still worst case estimate) full checks to the cached api per user per week. there are 16k downloads on f95checker 10.2, 34k downloads all time on all releases. worse case there are 50k users later down the line, 50k users doing 100 full checks of 30kb each to my cache api per week, thats 600 gigabytes of traffic per month for the full checks. my hosting provider gives max 40 terabytes per month with the base 5$ vps.
and thats before cloudflare. if done right, all of this is 100% aggressively cachable with cloudflare, thats what i do for another project of mine and that also gets roughly 600 gigabytes of traffic per month, of which only 22 gigabytes reach my server.
again, all with worst cases. i feel like its gonna be fine. the only concern there could be is whether my server is able to keep up cpu wise, but with cloudflare caching on top, and considering this is just serving data as is and not generating a webpage like f95zone has to do, should be fine.
the issue here is xenforo, which is a huge piece of software that manages lots of things. for each page request it probably has to handle all your profile, settings, customizations, notifications, recommendations, any custom plugins they have, and so on. as sam said, the issue is the requests that reach xenforo (like the page we are on right now). things like the latest updates page is not part of xenforo i dont think. and the checker api sam made for f95checker also isnt under xenforo. actually, that checker api he made is similar to what im doing now. he has a php script call out to their redis instance, and redis is designed for fast cached data access, so it just grabs the version for each thread id you want and returns it. the concept i have is similar, just that i store all the parsed game data too inside redis, and i have to periodically get new data. but point is, comparing to xenforo performance issues doesnt mean much. we cant know until i have it setup and we try it, and i feel like its gonna be fine, especially with cloudflare aggressively caching it on top.