More Google Answer Boxes, with Bonus Experiment!
Last week, drowned out by the Panda 4.1 rollout, the
MozCast Feature Graph detected a significant jump in the presence of answer boxes (+42% day-over-day, up to +44% on September 30th):
This measurement includes all types of “answer” boxes – direct answers, stock quotes, weather forecasts, box scores, and even the new, attributed answer boxes. Digging into the data, it appears that almost the entirety of the jump is in the new style of answer boxes. These are the answers that are extracted from 3rd-party websites, and they look something like this:
The key distinction is that you’ll see a search-result-style title and link below the answer. Separating just this data, the same two-week graph looks like this:
The day-over-day increase from September 25-26 in new answer boxes was +98%, almost doubling the total number in our data set. This clearly represents a significant expansion in Google’s ability to extract and display answers.
The “Winning” Queries
Over 100 queries picked up the new answer boxes in our data set. Below are 10 examples. Keep in mind that any given query may gain or lose its answer box for any given search, depending on factors such as search history, localization, and personalization:
- global warming
- project management
Many of these are general, informational answers, and quite a few of the new answer boxes in our data set seem to be coming directly from Wikipedia. With this update, Google also may have added a new capability – here’s the answer box for #3 above (“steampunk”):
The image on the right is being extracted directly from the article. While we’ve seen some examples of brand boxes with logos, the ability to directly add general images seems to be new. Other new answer boxes are more traditional, such as “mba”:
Many of these new queries seem to be broad, “head” queries, but that could be a result of our data set, which tends to be skewed toward shorter, commercial queries. One four-word query with a new answer box was “girl scout cookies types”:
It’s interesting to note that the more grammatically correct “girl scout cookie types” doesn’t seem to return an answer box. These new answers seem to be very dependent on query structure and how the query matches on-page keywords.
An Experiment in Answers
If Google is pulling more and more answers directly from the index (i.e. our sites), then it stands to reason we could update those answers. A couple of months ago, I noticed that one of my posts was producing an answer box for the search “how much does google make”:
Even as the author of this post, I had to admit that was a pretty terrible answer, especially being 3-4 years out of date. I quickly assembled a Twitter mob to deal with this problem (well, basically
Ruth Burr Reedy and David Iwanow), and we unanimously decided something must be done:
I decided to edit the top of the post, adding a user-friendly update for new visitors that gave new numbers for 2013. This went up on July 10th – I posted the update on social, and by later that day the new page was cached.
Two weeks went by, and there was no change to the answer box. Naturally, I assumed this was because the old text was still in place (I had simply added new information). So, on July 24th, I carefully removed the old content (that appears in the answer box) and edited the META description. By the next day, the new page was cached and the new snippet was showing up in Google SERPs.
So, what does that answer box look like today, almost two months later? Look up four paragraphs, because it’s exactly the same. Even though the content used in this answer box is now completely gone, Google is still using it in search results.
While this is only one example, it seems to suggest that these answers are not being extracted and created in real-time – they’re being stored in some sort of internal Google knowledge base. This may sound familiar, if you’ve read anything over the last month about Google’s theoretical
Unlike Freebase-based Knowledge panels and answers, this internal vault can’t be edited directly. Unlike organic results, where changes to our pages are generally reflected on the next crawl-and-cache, these answer boxes are being updated much less frequently. Since these new answers link directly to pages, they could be connecting to information that’s been mismatched for weeks or even months.
At this point, there’s very little anyone outside of Google can do but keep their eyes open. If this is truly the Knowledge Vault in action, it’s going to grow, impacting more queries and potentially drawing more traffic away from sites. At the same time, Google may be becoming more possessive of that information, and will probably try to remove any kind of direct, third-party editing (which is possible, if difficult, with the current Knowledge Graph).