Connect

Harvard Releases Massive Free AI Training Dataset, Reports Wired

byLexi Aida

December 12, 2024

Harvard University announced it is releasing a dataset of nearly one million public-domain books to support AI development. As reported by Wired, the dataset was created by Harvard’s Institutional Data Initiative with funding from OpenAI and Microsoft.

The collection includes books scanned during the Google Books project that are no longer under copyright protection. The initiative aims to make AI development more accessible, with project leaders stating it will help “level the playing field” in the industry.