Atlantic Reporter Exposes Four Music Datasets Feeding AI Training, Including Troves of 12 Million and 9 Million Tracks

Atlantic reporter Alex Reisner has identified four datasets of music being used to train artificial intelligence models and built a fully searchable public database from them.

The disclosure puts previously opaque training pipelines under direct scrutiny, with Google and Stability AI both having confirmed use of the datasets in published research papers.

Scale of the Datasets Two of the four collections are vast by any measure. One contains roughly 12 million tracks; a second holds approximately 9 million.

The remaining two are smaller but still substantial, each exceeding 100,000 songs.

Read the full story

Keep reading Open on NewsNTech