Massively-Multi-Class NLP Classification Dataset Each file title without the '.json' extension corresponds to a class ('~' replaces '/'). All files in the following format: {'data': {'0': {'0': list of sentences (as tab-separated string) in first paragraph of first section, '1': list of sentences (as tab-separated string) in second paragraph of first section, . . }, '1': {'0': list of sentences (as tab-separated string) in first paragraph of second section, '1': list of sentences (as tab-separated string) in second paragraph of second section, . . }, . . }, 'sections': {'0': title of first section (as string), '1': title of second section (as string), . . } } Dataset contains five subsets: * full: 18199 Wikipedia pages containing at least 8 sections with at least 12 sentences each and average sentence length at least 5; further filtering done on titles * uni: 3084 pages in full with single-word titles * wn: 3253 pages in full whose title corresponds to a WordNet synset * mini: 1748 pages in the intersection of uni and wn * tiny: 967 pages in mini with titles that are at most 12 characters long and correspond to at least one WordNet synset that is not a proper noun