Earlier this year we announced ‘Enhanced Entities’, a major step forward for AYLIEN News API. Now, we are further augmenting the capabilities of entities, to help our customers pinpoint the news that matters, by introducing new advanced search features: Entity Prominence and Entity Frequency. Searching for articles using either Entity Prominence or Frequency enables users to retrieve articles that are significantly more relevant to their entities of interest.
Here's a short video using our demo environment showing the value of Entity Prominence:
Now we'll take a deeper look at these new features:
Entity Prominence
What is it?: First, let’s define Entity Prominence. Put simply, it’s a measure of how prominent an entity is in an article, based on:
- If an entity is mentioned in the article title
- How close an entity is mentioned to the top of the article
- How many times the entity is mentioned in the article
Here’s why it matters: Sometimes when we search for news content about specific entities, we end up pulling back stories that are not particularly relevant to the entities that we want to read about. Using Entity Prominence in a query can filter out such stories, effectively separating the wheat from the chaff, leaving only the most relevant stories to our entities of interest.
For example, if we’re following what’s happening in the news with Microsoft, we may query the News API with a search like ‘entities:{{surface_forms.text: “Microsoft”}}’. Take a look at the top 5 results for this query:
Rank | Title |
---|---|
1 | Evolve IP Recognized in 2021 Gartner Market Guide for Desktop as a Service for Third Consecutive Year |
2 | Smart Cities Market 2021 Research including Growth Factors, Global Survey, Development Strategy |
3 | Evolve IP Recognized in 2021 Gartner Market Guide for Desktop as a Service for Third Consecutive Year |
4 | Tanla net profit soars 33%, reaches an all-time high |
5 | Enterprise Semantic Search Software Market to Witness Revolutionary Growth by 2027 | SharePoint, IBM, Lucidworks, Microsoft FAST, Oracle, Amazon CloudSearch, Apache Lucene, and Attivio |
We can see no mention of Microsoft in any of the article titles, meaning a low prominence in the top returned articles. Mentions of Microsoft are most likely incidental in these articles.
However, if we apply the Entity Prominence parameter to the same query, 'entities:{{surface_forms.text: “Microsoft” AND overall_prominence: [0.9 TO *]}}', we see that Microsoft is present in the 5 top returned article titles.
Rank | Title |
---|---|
1 | Enterprise Semantic Search Software Market to Witness Revolutionary Growth by 2027 | SharePoint, IBM, Lucidworks, Microsoft FAST, Oracle, Amazon CloudSearch, Apache Lucene, and Attivio |
2 | Dow Jones Today, Stocks Open Mixed, Rebound Wobbles; Crocs, OneMain, CSX Rally; Microsoft's Pre-Earnings Target Hike |
3 | The Microsoft Game Development Kit is now available for free on GitHub |
4 | Microsoft Stock Hits New High as Street Raises Price Targets Ahead of Earnings |
5 | Microsoft's cyber startup spending spree continues with CloudKnox acquisition |
Applying Entity Prominence has clearly had a positive impact on the relevance of our returned articles.
Entity Frequency
What is it?: This feature is simply the number of times an entity is mentioned in an article. Meaning that the more the entity is mentioned, the more important the entity is in the article. We can now leverage this feature to search for articles focussed on our relevant entities.
If we want to track the recent events around Donald Trump, we previously would have used the following query: ‘entities:{{surface_forms.text: Trump}}’
The titles below are from the top 5 search results given our query. From these titles, there’s not a lot of evidence to support these articles being about Trump himself. This is reflected in the number of times ‘Trump’ is mentioned in each article.
Title | 'Trump' Frequency |
---|---|
‘Don't want to get vaccinated, leave' — hedge fund founder mandates Covid shots in his office – CNBC | 1 |
Aaron Frazer Always Asks for the Semi-Secret Sauce | 1 |
EXCLUSIVE 'QAnon Shaman' in plea negotiations after mental health diagnosis -lawyer | 4 |
Jets' assistant coach dies days after being struck by car while biking | 2 |
Trump loyalists echo false Arizona election fraud claims in hopes of winning midterms | 1 |
If we leverage the “overall_frequency” parameter in our search, using the query 'entities:{{surface_forms.text: Trump AND overall_frequency: [5 TO *]}}', the returned articles are clearly more relevant to Trump! This is clear by the top 5 returned article titles in the table below.
Title | 'Trump' Frequency |
---|---|
Trump, too, was 'unsettled' when Rudy Giuliani's hair dye melted and dripped down his face: book | 9 |
Trump's PAC raises $75 million, spends $0 on election audit efforts | 16 |
What We Know About Addison Rae And Donald Trump | 9 |
Trump's sway tested in race for open mid-Ohio US House seat | 10 |
Biden jumps backs into campaigning with McAuliffe rally in Virginia | 13 |
Like Entity Prominence, Entity Frequency brings the most relevant articles to the fore, meaning we don’t have to sift through articles to find the gold among the gravel.
AYLIEN News API has the most comprehensive entity capabilities on the market, releasing multiple new enhancements to help users track the entities they care about, with more to come this year. If you would like to learn more about this set of features, or would like to suggest other features, please reach out to your customer success representative. In the meantime, read more about how to get the most out of entities on our doc site, or try a 14 day free trial by signing up here.
Related Content
-
General
20 Aug, 2024
The advantage of monitoring long tail international sources for operational risk
Keith Doyle
4 Min Read
-
General
16 Feb, 2024
Why AI-powered news data is a crucial component for GRC platforms
Ross Hamer
4 Min Read
-
General
24 Oct, 2023
Introducing Quantexa News Intelligence
Ross Hamer
5 Min Read
Stay Informed
From time to time, we would like to contact you about our products and services via email.