Entities are, in my not-so-humble opinion, the single most important concept to understand in SEO right now. Full stop.
Think I’m just another SEO professional spouting the latest “silver bullet” that will die on the table along with many before it?
Three of the most important ranking factors, at last disclosure, were:
All these areas have evolved since that disclosure, but there’s a good chance that the overall importance of them has remained.
We know that an entity is defined by Google as:
“A thing or concept that is singular, unique, well-defined and distinguishable.”
It’s important to understand that the thing does not need to be a physical object, it can also be a color, a date, an idea, and more.
An entity is anything that is:
Now let’s look again at these three ranking factors.
Content, from an SEO perspective, is the connection of entities by relationships.
In the statement, “SEO is dead” there is the entity “SEO”, there is the entity “dead” and there is the relationship that one connects to the other and the direction of the said relationship.
All content is fundamentally like this.
Links are, at their very core, a connection between entities even before we (or Google) thought of them as such.
They declare a relationship and direction between pages on the web. Those pages are entities that contain other entities.
Further, the entity of the anchor text is connected through a relationship to a topic (also an entity) and that topical entity is then connected via a directed relationship (the link) to the entity of the target page.
RankBrain is not a ranking factor in the traditional sense. Its job is not to act as a signal but rather to adjust which signals carry what weight.
For a query like [best holiday gifts], RankBrain would interpret which signals make the most sense to produce the best result.
Time itself being an entity, its importance would be weighted more strongly as a list from 2014, no matter how many links it has (for example) would be useless.
For a query like “American civil war” however the entity of authority rank would be a more important factor than freshness.
Essentially, RankBrain itself simply determines which entity metrics and relationships are most important for a specific query.
What Do We Know About Entities?
The majority of what we know about entities (or at least, what I know) is taken from some patents, some smart folks, and from what makes sense.
While patents generally need to be read with a grain of salt, the ones I’ll be referencing below make so much sense there’s little doubt they’re incorporated into Google’s systems.
That said, there are various ways Google could be using these patents. So I’m not going to pretend I know specifically how. So, we’ll talk about them generally and what direction they lead us in.
Ranking Search Results Based on Entity Metrics
Ranking Search Results Based On Entity Metrics is the title of a Google patent they were granted in 2015 and was the first patent on entities I read. It was not the last.
You can find my analysis of the patent here but it’s a lot of reading and dealing with formulas and in this article, we’ll save you that and cut to the chase.
According to the patent, the ranking of entities for search involves considering four factors. They are:
- Relatedness. Relatedness is determined based on the co-occurrence entities. Basically, if two entities are referenced frequently on the web (for example, “Donald Trump” and “President”) you get something like:
This is because they exist frequently enough together and on authoritative enough properties to yield a single result. This same process connects other entities with the term when we pluralize it:Each of these people is an entity and they are associated with the entity “President” and thus, when the query is plural– we see all of them.
- Notability. Google uses a fairly simple formula (in the patent) to determine how notable an entity is. Avoiding the formula, it basically breaks down that the more valuable an entity is (determined by things including links, reviews, mentions, and relevance), the lower the value of the category or topic it’s competing in, the higher its notability. On the surface, this doesn’t sound altogether logical, but basically what it means is that if you’re a big fish in a small pond you have higher notability than if you’re that same fish swimming in the ocean.
- Contribution. Contribution is determined by external signals (e.g., links, reviews) and is basically a measure of an entity’s contribution to a topic. A review from a well-established and respected food critic would add to this metric than Dave’s rant on Yelp about the price because their entity contribution in the space is higher.
- Prizes. The prize metric is exactly what it sounds like, a measure of the various relevant prizes an entity has received. These could be a Nobel Prize, an Oscar, or a U.S. Search Award. The type of prize determines its weight and the larger the prize the higher the value attached to the entity in question.
When all is said and done the process begins with the user requesting information on an entity.
I may enter into Google [best actresses].
After that, Google runs through their process in this order:
- Determine the relatedness of other entities and assign values.
- Determine the notability of those entities and assign a value to each.
- Determine the contribution metrics of these entities and assign a value.
- Determine any prizes awarded to the entities and assign a value.
- Determine the applicable weights each should have based on the query type (sound familiar?)
- Determine a final score for each possible entity.
- Produce a SERP that looks like…
Hey, we didn’t say their algorithms were flawless. But not bad.
Question Answering Using Entity References in Unstructured Data
Moving forward a bit in time, we’re now looking at a patent granted in 2016.
This patent contains some powerful ideas and, thankfully, is easy to summarize.
Here are the takeaways:
- Each entity is assigned a unique identifier. This will likely relate to Google’s acquisition of Metaweb in 2010 (which was covered by Bill Slawski here).
- Determining the most likely entity being requested by a searcher can be completed by establishing which entity appears the most times in the top 10 results. For example, if someone searches [dave davies] most of the top ranking sites are referring to the entity of the Kinks guitarist. And so that’s the entity used for things like the knowledge panel.
- There is an entity database. To save Google having to process the top results every time a query is run, a database exists that simply stores entities and their connections. Think of it like a link database, but for entities.
- Entities are ranked by a quality score that may include freshness, previous selections by users, incoming links, and possibly outgoing links. Remember, this is just the patent – don’t run out and link to every site you can find. I find that part unlikely to carry weight outside of very specific situations.
- When a query for an entity is conducted, the relevance of other entities is determined for the result. To illustrate, for the query [dave davies], Google needs to determine which entity metrics relate most importantly to it. The entity of birth date is deemed important, the entity of his brother, his band and a number of others are important enough to make the knowledge panel. That he was born eightth in his family is not deemed important enough. This is not to imply the importance of entities relates only to knowledge panels, just that it’s one of the clearest visual illustrations of it.
- The are methods for Google to infer context for multiple entities with the same name. To use their example, there is Philadelphia the city, the cream cheese, and the movie. If I ask a “where” question I would be referring to the city, “who acted in” would be the movie, and “what’s goes good with” would be the food. The answer, by the way, is lox, red onion, and capers.
- This technique allows Google to determine entities and their relationship when data is unstructured (referring to information that either does not have a pre-defined data model or is not organized in a pre-defined manner).
- This method also allows Google to learn new entities.
With this technique, Google’s capabilities around learning about entities and their relationships becomes significantly stronger.
Combined with their advances in understanding natural language and machine learning and the importance of entities jumps forward even more.
The last patent we’re going to discuss here is simply titled, “Related Entities,” and was granted in its current form earlier this year.
Here’s what we can take away from the patent:
- There is a mechanism for determining entity relationship priority. As was noted above, the order that Dave Davies entered life in his family is a known entity but is not prioritized over other entities determined to provide a higher probability of interest to the searcher.
- Stronger sites like Wikipedia provide a stronger relationship between entities. For example, a Wikipedia page discussing Ronald Reagan as the president of the U.S. would connect the two entities of “Ronald Reagan” and “President” far more than their mentions in this article with a topical authority related to SEO and marketing.
As you can see, the patent itself is quick to summarize but the ideas within it are incredibly powerful.
Why Do Entities Matter for SEO?
Entities matter for SEO because, at their core, they are the world.
We ourselves understand everything around us in the context of entities and their relationships. We just tend not to think of it that way.
A big part of the reason we’re just starting to talk about this now is that it takes machine learning to make use of the concept from a search level.
Without machine learning, Google couldn’t understand language well enough to interpret pages and entity relationships.
Without machine learning, and RankBrain specifically, Google couldn’t learn how to prioritize signals accurately and on-the-fly and adjust for unknowns and learn from them.
So now we’re starting to see this all come about and with it a massive change in how pages are ranked.
With entities come:
- The ability to calculate the probability of meeting the user’s likely intent with far greater accuracy.
- The ability to understand from language and tone, whether a result will be positive or negative.
- A dramatically reduced reliance on links.
Links will remain as a signal I’m sure, but they will become simply one mechanism among many for establishing entity values.
To optimize in this new world, we need to change the way we think about our sites and how we market externally.
If we want to rank for “blue widgets” we need to consider that Google can now or will soon understand all the various entities related to them and in which order the searcher intent will most likely be met.
And you need to now consider which entities you need on your site and how they need to be connected to maximize the probability of Google understanding that you are more likely to meet the variety of possible intents than your competitor.
Entities & Links
Perhaps more important for SEO professionals will be the change in links.
If I am right, and it seems inevitable, links will become simply one entity connector among many.
Why would a link be necessary to pass value if every other signal and a strong understanding of how entities relate is in place?
Google doesn’t need to see that I’m specifically linking to the site of Dave Davies of the Kinks.
They’ll know from context that this article references that entity and oddly tie it to a variety of entities such as Google Patent US20180046717A1, but their systems will determine that the relatedness is just not there and the association between this article and the Kinks guitarist will be minimal.
One thought to take away is to consider every logical connection, regardless of the type of SEO you’re doing.
If you’re writing content, think of the other entities that should exist on the page or site and make sure they are. Look at the top 10 sites and determine which other entities are on those pages.
And when you’re doing link building, think of the entities you’re most interested in associating yourself with and get links on those sites, knowing even if links diminish in value, you’re still OK.
Are you a realtor in Miami? Get links on realty sites but also on sites related to Miami. You can take it a step further and think about the types of sites that also strongly relate to realty.
Mortgage brokers, for example, would have a strong entity association with real estate and thus make good second-tier entity references.
Entities Are Here to Stay
Entities are necessary for Google to hand us the information we demand when we’re requesting the entity “pizza” with the location relationship of the entity “near me.”
So order one and start thinking about what content you going to tackle next.