Anyone using news content, in almost any context, will already be familiar with the concept of tagging. As consumers, we see the tags applied to news articles every day, and in some cases we will use them ourselves to find articles relating to subjects we are particularly interested in.
For anyone in the news business, meanwhile, tags are essential. They help organize, and distribute content. They make interrogation, investigation and analysis of events more effective and they help provide a better service to our readers and consumers of published news content. Essentially they help ensure that right news, whether in a news app or discovery process, is put in front of the right people at the right time.
What are category tags and why do we need them?
The simple answer to the first question is that tags are metadata, within which we communicate the subject matter of a news item, or the category in which it sits.
Whilst news items will normally come with basic metadata appended such as author, date published, publication and so on, these are not a huge amount of use to us when we need to categorize stories by subject matter.
And of course, that is a common requirement anywhere news is handled or consumed. Just to name four examples:
- When providing a news feed or service to customers, the utility of that service is greatly improved by effective category tagging.
- For news providers, category tags support analytics around which types of news are most popular, and which are less appreciated.
- Tags can be used to ensure that appropriate advertisements are displayed alongside news content, increasing click rates and revenues.
- For organizations managing risk, tags enable news relating to specific subjects to be surfaced accurately and consistently.
So in summary, tags are added to articles in order to make them easier to organize, distribute and analyze in a consistent, effective way. They are part of the process of turning unstructured data (just some words) into structured data (an article with metadata appended that tells us what it is, and what it is about). As such, tagging is a vital aspect of turning news into a usable resource.
How should tags be added to news articles?
If we assume that adding tags to the news content we will use is a good idea (and it is), then the next question is how best to apply them.
Broadly speaking, there are three approaches:
- Add tags to news content manually
- Use existing tags added by publishers
- Using Machine Learning and Natural Language Processing (NLP)
The first option runs into familiar, well-understood problems.
The first of these is scalability, which is such a deal-breaker that we could probably reject this approach on that basis alone. Adding tags manually takes time. A lot of time. And when time is money, a lot of time is a lot of money.
But the problems don’t end there of course. Any manual process will almost certainly introduce inconsistencies. Some items will be tagged as being many categories, some just one. As categories will not necessarily be defined ahead of time, and indeed may be ignored even if they are, multiple categories for the same thing will inevitably end up existing.
To take an example close to home, the subject ‘Ireland’ could be (and almost certainly WILL be) tagged using:
- Ireland
- Republic of Ireland
- ROI
And probably many other alternatives. What this means, of course, is that tags will no longer do the job they are supposed to do - find every item relating to a particular subject. So all our effort has gone to waste.
Lastly, as with any manual process, people get lazy. They start re-using the same tags over again, take the simplest option, and allow inertia to set in - with a corresponding decrease in accuracy.
Simply put: manual tagging just doesn’t work.
The second option - inheriting tags from publishers, often runs into similar problems. In particular, when individual journalists add tags to copy themselves, it is almost an open invitation to inconsistency and incompleteness. Individuals have their own subjective perspectives, after all, and what one person might tag as ‘Business Failures’, another might tag as ‘Bankruptcy and Insolvency’.
That issue is compounded by the fact that there is almost certainly no consistency across publishers. Each will tend to use category tags in non-standard, proprietary ways, which means any news application integrating content from multiple sources (which is almost all of them) will suffer from inconsistent tagging to an even greater extent.
In addition, it is often hard to access publisher tags as metadata. They are not necessarily set up to be shared with third parties in that way, but rather for internal use.
So with manually doing things yourself and relying on publisher tags considered as unworkable options, that leaves automation or using intelligent tagging techniques as our final alternative.
Categorizing content with AI
AI and Natural Language Processing (NLP) can be leveraged to analyze the text of any given news item to understand what it is about, and enable accurate tagging as a result. Specifically, NLP models can evaluate not just the words, but also the syntax and the semantic context of text within any given piece, both within sentences and also across sentences. It can also classify entities into types (people, places, organizations and so on), and even identify entities and categories that are not explicitly referenced in the article.
By doing so, and doing so in a way that is by definition consistent, the NLP approach enables articles to be automatically tagged, to whatever level of granularity is required, and in a way that is uniform and scalable (because tagging is now done by machine). In other words, it is the most appropriate way to tag content.
The AYLIEN News API approach
In our News API we use proprietary Machine Learning and NLP models to add a collection of data enrichments to every article we ingest. This includes topical tags which can be used to provide contextual information about an individual article or collections of articles in customer news feeds, or more commonly as search parameters which enable accurate search and scalable capabilities.
Specifically, we tag content according to three taxobomies. These consist of two industry-standard taxonomies, IPTC News Codes and IAB QAG, but more powerfully our proprietary AYLIEN Smart Tagger, which offers far greater breadth and depth of both industry and category tags. Let's take a quick look at each of the categorization options:
Option 1. IPTC News Codes
The international standard in news media for tagging news content, IPTC News Codes is a collection of 1,400 categories with multiple levels of depth. By using this taxonomy, we ensure that metadata applied to news content is consistent with publishing industry standards, and thus integrates seamlessly with third-party services or platforms when necessary. You can search for relevant categories and tags in our interactive table here.
Option 2. IAB QAG
This classification feature automatically categorizes articles into hierarchical groups based on the IAB quality assurance (IAB QAG) guidelines. IAB QAG is seen as the industry standard taxonomy in the digital advertising and publishing space and is made up of ~400 multi-level tags.
Option 3. AYLIEN Smart Tagger
Smart Tagger is a significantly more sophisticated classification model that has been built and trained specifically for finance, risk and media intelligence use cases. This far more granular list of industry and topic tags provides greater accuracy and precision, depth and breadth, and domain relevance.
Smart Tagger contains two additional taxonomies:
Industries: Articles are automatically tagged with one or more relevant industry tags based on a detailed multi-tier industry taxonomy of ~1,500 potential tags
Categories: Articles will be tagged with one or more topical and event-based category tags using a multi-tiered category taxonomy of ~3,000 potential tags
For more information about Smart Tagger, take a look at this page.
Sign up for a free trial of
AYLIEN's News Intelligence platform
Related Content
-
General
20 Aug, 2024
The advantage of monitoring long tail international sources for operational risk
Keith Doyle
4 Min Read
-
General
16 Feb, 2024
Why AI-powered news data is a crucial component for GRC platforms
Ross Hamer
4 Min Read
-
General
24 Oct, 2023
Introducing Quantexa News Intelligence
Ross Hamer
5 Min Read
Stay Informed
From time to time, we would like to contact you about our products and services via email.