At AYLIEN, we empower people to understand what’s happening around the world by providing access to a vast, live database of enriched news content that increases in size by about one million new articles every day. We ingest, analyze, and index content from over 80,000 sources, across over 120 countries, and in 16 languages.
With the benefit of adding about 20GB of new data every day comes the challenge of filtering all of this data in order to find the most relevant articles. Sure, we indexed 1,000 new articles about Dogecoin yesterday, but how do you find the exact story that will help you make the right decision about your portfolio tomorrow?
Put simply, you can have all of the data in the world, but unless you have the tools to get the right piece of that data at the right time, it’s useless.
With this challenge in mind, we’re continually developing better tools for our users to easily find their signal among the noise of global news content, to help them find the exact story that they need at a given time, whether that article was published five minutes ago or five years ago.
This week, we’re proud to release two more of these tools - Keyword Boosting and Proximity Search. If you like to keep your searches simple and stick with keywords (but you really should check out our entity features), these tools can supercharge your results - and all you need to leverage them are two extra keystrokes.
Here's a brief video overview of both new features:
We search news content to get the most relevant articles for what we need. But when we use a list of keywords in a search, sometimes one of the keywords is more relevant than the others. In simple keyword searches, this is a problem because all searched keywords are treated as equally important.
Boosting enables us to essentially put our finger on the scales when the articles’ relevance to our search is being calculated, giving extra weight to a specific keyword. This means we can find articles that meet all of the search criteria, but stories that mention this keyword more prominently will be boosted to the top of the results.
Take, for example, David Cameron’s recent involvement in the Greensill/Credit Suisse scandal. Using a simple keyword search to return stories like this would be done with a search like `text:(“Greensill” OR “Credit Suisse”) AND ”David Cameron”`. But simple keyword searches will give equal weight to each keyword, so you will get stories that mention David Cameron in passing and the other two keywords frequently. Take a look at the top 5 results for this query:
|1||Greensill Capital risk is $2.3bn, says Credit Suisse|
Credit Suisse scraps bonuses and replaces two bosses after Greensill and Archegos losses
David Cameron ‘in line to make £200m from Greensill flotation’
|4||Greensill lender Credit Suisse suffers 'unacceptable' loss|
|5||David Cameron ‘was in line to make £200m from Greensill flotation’|
|Query:||("credit suisse" OR "Greensill") AND "David Cameron"|
We can see from the titles that the results are relevant to all the keywords used in the search, and if an article mentions "Credit Suisse" or "Greensill" frequently, and "David Cameron" tangentially, it is often considered more relevant than a story that mentions "David Cameron" more prominently than the other two.
Boosting is a way to put your finger on the scales when the News API is assessing which articles are most relevant to your search. By searching `text:(“Greensill” OR “Credit Suisse”) AND ”David Cameron”^10`, you are making “David Cameron” ten times more important than the other keywords, which boosts articles about Cameron’s involvement higher up the results order:
|1||David Cameron welcomes inquiry into Greensill lobbying|
|2||Greensill: The questions still facing David Cameron|
|3||Greensill lobbying row: Rishi Sunak texts to David Cameron released|
|4||The growing list of questions for David Cameron|
|5||Greensill: What is the lobbying scandal and why is David Cameron involved?|
|Query:||("credit suisse" OR "Greensill") AND "David Cameron"^10|
Above we can see far more mentions of Cameron in the titles, suggesting that the articles are more relevant to David Cameron’s involvement.
And if you want to see even more relevant content, try combining boosting with AYLIEN’s `title` parameter - `text:(“Greensill” OR “Credit Suisse”) AND title:”David Cameron”^10`. Look at how relevant these stories are to David Cameron’s involvement in the scandal, all from using keywords alone:
|1||David Cameron faces unprecedented formal inquiry into Greensill scandal | David Cameron|
|2||David Cameron lobbying: What is the David Cameron-Greensill row all about?|
|3||BRITAIN POLITICS DAVID CAMERON GREENSILL|
|4||David Cameron kept pushing Bank and Treasury to risk £20bn to help Greensill | David Cameron|
|5||Timeline: David Cameron and Greensill Capital|
|Query:||text:("credit suisse" OR "Greensill") AND title:"David Cameron"^10|
This is just one use case for boosting keywords, keep an eye on our blog for more how-tos on this, or better yet, grab an API key on our free trial and test it out on content that’s important to you today!
In the sea of news articles published every day, the companies, people, and things you care about are referred to in many different forms. A regional subsidiary will not always be referred to by its full name, an event can be described in multiple ways, and with simple keyword searches alone we would have to know all of these permutations ahead of time to catch mentions of these different forms.
Proximity search makes this challenge easier to overcome by enabling you to broaden or narrow your search which can help you define the things you are looking for by supporting myriad different expressions.
For example, if we want to look not at HSBC in general, but HSBC’s activities in China or its Chinese subsidiary, we can search for stories that mention HSBC and China within a given distance. This is a broader search than “HSBC China”, but a narrower search than searching for stories that simply mention “HSBC” and “China”. Take a look at the mentions of the searched keywords in the results:
|...HSBC Bank (China) Company Limited...|
|...the HSBC-GIF China Consumption Opportunities Fund...|
|...Bank of China, HSBC, and China Construction Bank...|
|...State Council of China, EBRD, HSBC, Iber...|
|...HSBC raised China Southern Airlines from a “hold”...|
On the table above, we can see that HSBC is referred to in different ways and we are catching mentions beyond simply “HSBC China”. We can also see that we are catching mentions of HSBC’s activity in China.
In each of these examples, we have gotten valuable articles for the queries we set up, and all it took was a couple of extra keystrokes `^` or `~`, followed by the number we wanted to boost by or the maximum distance we wanted between the keywords. You can try these advanced keyword operators out yourself by grabbing a free API key below, downloading an SDK or Postman collection, and then copying a code snippet from the docs.
Happy boosting! For further details and examples view our Advanced Keyword Operators support documentation.
23 Nov, 2022
Surfacing Risk Signals for Financial Institutions Via the iDP Data Marketplace, by ORX
3 Min Read
18 Nov, 2022
10 years of AYLIEN, and here’s to many more
2 Min Read
17 Oct, 2022
Identifying causal links in NLP-enriched news data (with R code and dataset)
5 Min Read