Text Classification

By Classifying text, we are aiming to assign a document or piece of text to one or more classes or categories making it easier to manage or sort. To manually categorize and group text sources can be extremely laborious and time-consuming.

To automatically classify documents it’s necessary to identify subjects or topics in the document to decide which category or class the piece of text belongs to, however, it is also important to consider certain entities which may also determine the classification, for example, the author.

With the Classification endpoint, you can automatically classify large numbers of documents or URL’s into a different categories. The API can classify text based on 2 different taxonomies.

The IPTC SubjectCode-based classifier can identify up to 500 Categories based on IPTC international Subject News Codes which help understand and tag news and media sources.

The IAB QAG Classifier identifies categories based on the International Advertising Bureau’s Quality Assurance Taxonomy, which makes it easier for Ad Tech solution providers to effectively categorize web pages, articles, and blogs.

Classification Example:

Let’s say we want to classify the following article:

GET /classify?url=https://www.bbc.com/sport/0/football/25912393

Which tells us the article is about “sport – soccer” as well as providing us with its IPTC standard category code, 15054000 and a confidence score.

 To see a complete list of all supported taxonomies and their details click here.