Entity-based indexing: From content index to entity index

Reddi1 · Post by **Reddi1** » Thu Jan 30, 2025 6:37 am

Until 2013, the basis for Google rankings were websites, their content , keywords and backlinks . With the Hummingbird update and the Google Knowledge Graph, Google started its own transformation into a semantic search engine . Entities play a central role in this transformation with regard to indexing. I would like to explain why this is the case in this part of my series of articles on the subject of semantics and entities at Google .

Table of contents [ Hide ]

1 Entity-based indexing: From content-first index to entity-first index
2 My theses on entities and indexing
3 The advantages of an entity-based index
4 Challenges of Entity-Based Indexing
5 More exciting Google patents related to entity-based indexing
Entity-based indexing: From content-first index to entity-first index
First of all, I would like to deal with a very detailed five-part article by my American colleague Cindy Krum . It forms the introduction to my further thoughts on the topic of indexing and entities.

Cindy and her team have conducted a lot of tests and research jordan phone number data to prove that Google is increasingly concerned with understanding entities. She links this directly to the introduction of the Mobile First Index. At the heart of her argument is language. According to Cindy, Google wants to understand entities regardless of language.

With Google's new Entity based understanding, the language of the entity and content does not matter as much anymore – at least in some languages, and for some queries. Content can be clustered in the index based on the entity understanding, without being omitted because it is in the wrong language.

According to Cindy's assumption, the new mobile index is based on the information from the Knowledge Graph, which is why she calls it the Entity First Index . Here, the content or documents and entities that are related to the main entity are subordinated to the main entity and then placed in an entity hierarchy.

The relationships between the elements are no longer established based on a link graph, but rather on the knowledge graph . The link graph in its current form would at some point no longer be scalable due to the increasing amount of content and different platforms.

Continuing to organize and surface content based on the Link Graph is just not scalable for Google's long-term understanding of information and the web and it is definitely questionable in terms of the development of AI and multi-dimensional search responses that go beyond the browser…

Constantly crawling and indexing content based on something as easy to manipulate as the Link Graph and as fluid as language is hard, resource intensive, and inefficient for Google; And it would only grow more inefficient over time, as the amount of information on the web continues to grow.

Finally, most crucially for Google's long term goals, Google would not be able to benefit from the multiplier effect that 'aggregation of ALL the information' could have for the volume of machine learning and artificial intelligence training data that could be processed by their systems, if only they could they could get around the problem of language … And this is why entities are so powerful!