#hot111 Bot Machine Room Inside

kusBlog, News, Technology

You might be wondering how our bot is creating the #hot111 charts. Here are some simplified insights.

Every Sunday our bot runs on our test system and generates the charts. We manually check the results for spam, statistic cheaters and errors.

Every Monday our bot runs in production. It takes 31 steps, here’s the simplified breakdown to thirteen. The whole generation takes around two hours.

Step one

It catches all Creative Commons licensed tracks via APIs from Jamendo, Soundcloud, Free Music Archive and Archive.org and writes the data to our database. That’s 50’000 tracks.

Step two

It filters out all Creative Commons NC and ND tracks and filters all BY and BY-SA.

Step three

The APIs have some glitches. It deletes all duplicates.

Step four

It generates the first basic HotRank and HotHotHotRank.

Step five

It marks all spam tracks and statistic cheaters and applies the Black and Whitelist.

Step six

It generates the Hotrank and HotHotHotRank one more time in depth.

Step seven

It purges all not popular, not relevant tracks. 1000 tracks left.

Step eight

It gets the metadata via APIs from Facebook, Last.fm and YouTube.

Step nine

It maps the subgenres to main genres.

Step ten

It fixes the SoundCloud stats.

Step eleven

It generates the HotRank and HotHotHotRank one last time.

Step twelve

It generates and caches the genre tag cloud.

Step thirteen

It generates the sitemaps for the search engines.

Various

There are more steps necessary like regularly keeping the tracks up-to-date and ensure, that deleted tracks on SoundCloud are also deleted in our database. It regularly checks for new YouTube videos. It ensures that bot traffic doesn’t influence the internal stats and it hourly recalculates the HotHotHotRank.

The #hot111 is open source, built with the Symfony framework run on a LEMP stack and you can contribute on GitHub. Get in touch, if you are interested.