You might be wondering how our bot is creating the #hot111 charts. Here are some simplified insights.
Every Sunday our bot runs on our test system and generates the charts. We manually check the results for spam, statistic cheaters and errors.
Every Monday our bot runs in production. It takes 31 steps, here’s the simplified breakdown to thirteen. The whole generation takes around two hours.
It catches all Creative Commons licensed tracks via APIs from Jamendo, Soundcloud, Free Music Archive and Archive.org and writes the data to our database. That’s 50’000 tracks.
It filters out all Creative Commons NC and ND tracks and filters all BY and BY-SA.
The APIs have some glitches. It deletes all duplicates.
It generates the first basic HotRank and HotHotHotRank.
It marks all spam tracks and statistic cheaters and applies the Black and Whitelist.
It generates the Hotrank and HotHotHotRank one more time in depth.
It purges all not popular, not relevant tracks. 1000 tracks left.
It gets the metadata via APIs from Facebook, Last.fm and YouTube.
It maps the subgenres to main genres.
It fixes the SoundCloud stats.
It generates the HotRank and HotHotHotRank one last time.
It generates and caches the genre tag cloud.
It generates the sitemaps for the search engines.
There are more steps necessary like regularly keeping the tracks up-to-date and ensure, that deleted tracks on SoundCloud are also deleted in our database. It regularly checks for new YouTube videos. It ensures that bot traffic doesn’t influence the internal stats and it hourly recalculates the HotHotHotRank.