Some Random Observations

Thursday, August 5th, 2010

By Bill Slawski, on July 2nd, 2010

People still read books. I started on Nudge: Improving Decisions About Health, Wealth, and Happiness not long ago. I’m about a fifth of the way through, and I’ve already added “Choice architecture” to my list of concepts to study more, and I’m looking more carefully at the choices I make.

Seeing a lot of intriguing search patents published by Yahoo over the past few months, and that’s made me sad. I don’t know if they will end up in the graveyard of unfulfilled intellectual property, or migrate to Redmond, Washington, with Microsoft taking over Yahoo’s search results.

My favorite baseball team is in first place in their division after more than a decade straight of losing seasons (Go Reds!). Part of the reason for their winning comes from a few trades that have turned out better than expected, and part comes from an improved minor league system. I can’t help thinking of that as I watch Yahoo search engineers move to Microsoft or begin startups of their own. Also wondering if the Yahoo/Bing search merger has helped to made Google stronger. Especially when observing things like Yahoo’s Chief Scientist of Search choosing to join Google instead of Bing.

Seeing too many Search Engine Optimization tools that include keyword density calculators. Please stop.

My town holds their 16th annual Children and Pets Parade tomorrow morning. If you bring your dog or cat or other pet, you can march your pet down Main Street. Children in the parade are encouraged to be on wheels, whether strollers, bikes, wagons, scooters, tricycles, carts – anything without a motor. The mix of pets and kids on wheels is more fun to watch than fireworks.

Google’s webspam chief, Matt Cutts, made a post a couple of days ago on Webspam projects in 2010?, soliciting suggestions for what his readers might like to see Google focus upon to reduce webspam. Have you, or will you join the discussion? I have a few ideas in mind, myself.

I’m continuously puzzled by my State and Local governments’ failure to use the Web more intelligently. Public notices in my newspaper about hearings and solicitations for public commentary tell people to call or email to receive documents about proposed goverment actions. Not sure why they don’t publish a URL leading directly to those documents online, but they never do.

Sometimes I find myself saying under my breath, “140 characters isn’t enough,” when responding to a question on Twitter. Usually it is though, inspite of my grumblings.

View the Original article

Categories : SEO Tips
Comments Comments Off

Bing's Categorized Search Results

Thursday, August 5th, 2010

43 comments 43 comments to MikeJuly 6, 2010 at 1:06 pm

Thanks! I am still trying to find out how to master Bing. While the features listed in this article are definitely cool, I just can’t seem to steer traffic to my site.

I bet you that 95% of my search traffic is from Google and Yahoo, and have been disappointed about the lack of SEO finds by Bing……..especially when you consider that they are supposed to be taking over the Yahoo search function.

Does anyone have some Bing pointers on how to better optimize for their search?

Bill SlawskiJuly 6, 2010 at 3:17 pm

Hi Mike,

The methods that the search engines use to rank pages all differ, but their ultimate goals are to deliver people to pages that are the best results that they can deliver based upon the relevance of those pages to queries used by searchers, and some measure of quality of those pages.

A number of whitepapers over the past couple of years from Microsoft have made it clear that Microsoft was using a ranknet algorithm to rank pages in their search results. One of the first papers from Microsoft that described that ranknet approach came out in 2005 – Learning to Rank using Gradient Descent (pdf). The paper gives us some hints at the kinds of things they might look at when creating a machine learning approach to ranking pages:

We report results on data used by an internet search engine. The data for a given query is constructed from that query and from a precomputed index. Query dependent features are extracted from the query combined with four different sources: the anchor text, the URL, the document title and the body of the text. Some additional query-independent features are also used. In all, we use 569 features, many of which are counts.

I wrote about that paper and some other papers and patents from Microsoft, back in 2006, in the post Feature based rankings at MSN.

One of those papers hinted at some potential ranking factors that they might consider in the future, such as:

Other features could include the number of images on a page, size of those images, number of layout elements (tables, divs, and spans), use of style sheets, conforming to W3C standards (like XHTML 1.0 Strict), background color of a page, etc.

I suspect that Microsoft does use a number of measures that might surprise people, to attempt to actually understand the quality of webpages, such as reading levels associated with pages. So if they are looking at things like W3C validation, that might not be a surprise.

A couple of posts about Microsoft patents/papers that you might find interesting as well:

How a Search Engine Might Analyze the Linking Structure of a Web Site
Microsoft Granted Patent on Vision-Based Document Segmentation (VIPS)

Jey PandianJuly 6, 2010 at 3:20 pm

Hi Bill,

I enjoyed reading this post, I’ve printed out the patent to examine it in greater detail later.

To answer your question about Bing’s categorized results: I used the author’s example of flash and other keywords to run multiple queries on Bing for “apple, iphone, blackberry, shoes, flash, ibm, google and san francisco” with personalization enabled and disabled.

With regards to the corporation and city specific names, I noticed these categories look eerily like the Google sitelinks. I could be wrong here but the category ordering for this small sample of keywords appears to be based off the order of the categories in the first result from an initial glance.

I wonder if Microsoft is serving these categories on basis of search volume. High search volume for a specific set of documents

View the Original article

Categories : SEO Tips
Comments Comments Off

By Bill Slawski, on July 13th, 2010

Information about where searchers hover their mouse pointers over different parts of search results, as well as advertisements and Google Onebox results, may be collected by the search engine to be used as ranking signals to determine in part how relevant those items may be seen by Google users in response to a search query.

When I view the contents of a web page, I often find myself moving my mouse pointer along the areas that I am viewing. There are a couple of reasons for this. One is that it makes it easier to focus upon the part of the page that I’m looking at. Another is that it’s easier to click upon a link that I find interesting if my pointer is near what I’m viewing.

According to Google, I may not be alone in this kind of behavior. Google may track mouse movements on its search results pages to help rank pages that show up in search results, to determine the quality of sponsored ads within those search results, and to decide whether or not showing onebox results such as maps or definitions or news or stock quotes is appropriate for some search queries.

When Google ranks web pages, it considers a wide range of ranking signals, such as how relevant a page might be to keywords used by a searcher, the quality and quantity of links pointing to that page, and user-behavior data collected about that page.
A number of patent filings and whitepapers from Google have told us that Google might collect a fair amount of user-behavior data about how we browse web pages such as; how long we might spend on pages, how far we might scroll down those pages, which pages we might click upon in search results, which pages we might not click upon, which links we might follow when we visit pages, if we print or bookmark or save pages, and more.

A newly granted patent from Google explores how they might look at how we move our mouse pointers on search results pages as a ranking signal.

Problems with Click Through Rates

One method of ranking search results based upon user-behavior is to see which pages people click upon when they perform a search at a search engine, and which ones they don’t. But, there’s a potential problem with that approach.

Let’s say that you search for Barack Obama’s birthday in Google, and the top result shows the birthdate in Google’s snippet. There’s no need to click through to the page, and your informational need is satisfied. Google probably wants that page to continue ranking well, but if the search engine relied upon click-throughs to measure the relevancy of a page for a query, then a lack of clicks to the page would make it seem to be not very relevant.

If Google instead looked to see that people hovered their mouse pointer over a snippet that contained the President’s birthdate, and then moved on to a completely different search, Google might consider that page to be very relevant to that query.

If that search result was the third or fourth listing in that set of search results, but a large number of people hovered their mouse pointers over that snippet, and then went on to a new set of queries, Google may rerank that particular search listing and move it to a more prominent place in the search results, possibly even to the first listing.

The Google patent is:

System and method for modulating search relevancy using pointer activity monitoring
Invented by Taher H. Haveliwala
Assigned to Google
US Patent 7,756,887
Granted July 13, 2010
Filed: February 16, 2005

Abstract

A method and system of modulating search result relevancy use various types of user browsing activities. In particular, a client assistant residing in a client computer monitors movements of a user controlled pointer in a web browser, e.g., when the pointer moves into a predefined region and when it moves out of the predefined region.

A server then determines a relevancy value between an informational item associated with the predefined region and a search query according to the pointer hover period. When preparing a new search result responsive to a search query, the server re-orders identified informational items in accordance with their respective relevancy values such that more relevant items appear before less relevant ones.

The server also uses the relevancy values to determine and/or adjust the content of an one-box result associated with a search query.

The Value of Mouse Movements and Placement

When you are looking at a set of search results, Google may track where your mouse goes on their results page. They tell us that:

A typical user’s behavior is to move the mouse pointer (or any other pointing indicator) over or near a target informational item, keep the mouse pointer there for a period of time while the user reads the item’s information (e.g., title and snippet), and then click through the underlying link or move to another item.

Sometimes, a user may review multiple informational items responsive to a search query, moving a pointer over or near each of the informational items that the user reviews. These various pointer activities can provide another way to evaluate the user’s feedback with respect to a particular informational item.

The patent presents a couple of assumptions about how mouse pointer movements can be interpreted:

For example, a longer hover over a result may indicate a positive opinion about how relevant a listing on the results page might be to a query.

And, if someone moves their mouse pointer across a snippet line by line at a normal reading speed, it may indicate a higher level of attention to that result than if pointer was kept in a static position or moved randomly.

So, the speed and movement of a mouse pointer as well as where it is placed on a search result page might be tracked to see how much attention a search pays to different search results. If someone hovers over one sponsored listing, or ad, but not another, that might indicate more attention and interest in the ad hovered over. If a local map is shown, or a definition, or some other OneBox result, and the searcher viewing the page hovers over those OneBox results for a while, that could be an indication that the map or the definition or other OneBox listing was helpful.

Client Attention Data

The patent refers information about the tracking of mouse pointer movements as “client attention data,” because this kind of measuring of browsing activities can give the search engine an idea of how much interest there is in different parts of a search result page, and how much attention visitors paid to each of those parts. If there are similar patterns about how a large number of viewers interacted with a page, that data may provide some meaningful information that can be acted upon by the search engine.

The patent also tells us that it might give different weights in determining a relevancy value for mouse pointer movements based upon different areas of a result. If someone hovers over the title to a search result, that might carry a different amount of weight than if they hover over the snippet of a result.

Conclusion

This patent was originally filed in 2005, and it’s possible that Google may not be using the methods described, or tested those methods and have since moved on to other ways of tracking searcher’s attention on search results pages. It’s also possible that Google may be using those mouse pointer movements today.

Looking at how someone may move their mouse pointer across a page does provide more useful information to the search engine than just looking at what items on a search result page that someone might click upon or not click upon. We haven’t seen too many patent filings from the search engines that go into so much depth on how they might measure one specific type of user-behavior and interpret it, like we do with this patent.

This patent was also filed before Google introduced Universal Search, and when it mentions OneBox results, it means those special results from other data repositories from Google such as Maps, News, Stock Quotes, etc., that were often shown at the tops of search results rather than blended into search results. It’s possible that a mouse pointer tracking approach could be used by Google to see how effective or useful blended results might be when they are located in places other than at the top of a set of search results as well.

If one sponsored listing, or ad, is hovered over for a while by many viewers, while others aren’t, should that play a role in the placement of an ad? Is that kind of user-behavior part of the quality score for sponsored listings?

Added (7/13/2010 – 12:30): Interesting observations on the Acuity blog about a June 2 presentation at Eyetrack UX in Belgium by Google Senior User Experience Researcher Anne Aula – Eye Gaze Data and the Correlation With Mouse Movement.

Added (7/14/2010 – 9:43 am): A paper to be presented next week at SIGIR’10 in Geneva, Switzerland, from researchers at Emory University is also worth a close look – Ready to Buy or Just Browsing? Detecting Web Searcher: Goals from Interaction Data. It describes how user-behavior data such as mouse movements and scrolling on search results pages from search engines can be used to help understand the intent behind a search.

View the Original article

Categories : SEO Tips
Comments Comments Off

Google Gets Smarter with Named Entities: Acquires MetaWeb

Thursday, August 5th, 2010

By Bill Slawski, on July 17th, 2010

You may know him by a number of names or titles – Governor of California, Terminator, Governator, Conan the Barbarian, Kindergarten Cop, Mr. Universe, Mr. Olympia, Arnold Strong, Arnie, The Austrian Oak.

To Metaweb, Arnold Schwarzenegger is referred to as 9202a8c04000641f8000000000006567.

Who is Metaweb?

Metaweb is a company recently acquired by Google, and they’ve created a system of indexing named entities that allow you to search for information in a new way. Actually, the idea sounds a little like a library’s dewey decimal system, but for named entities. Why is this important, and what is a Named Entity?

A named entity is a specific person, place, or thing. For example, named entitles can include Barack Obama, or the Commonwealth of Virginia, or the Great American Ballpark in Cincinnati. Associating unique identification numbers with named entities can make it easier to index them, and to find information about those named entities when they might be referred to by different names, like my example above about Arnold Schwarzenegger. They can also help with local search, by allowing specific places or businesses or landmarks to have unique identification numbers.

How often do named entities appear in Web searches? A recent paper from Microsoft, Building Taxonomy of Web Search Intents for Name Entity Queries (pdf) tells us that they are pretty common:

According to an internal study of Microsoft, at least 20-30% of queries submitted to Bing search are simply name entities, and it is reported 71% of queries contain name entities.

Google announced their acquisition of Metaweb in an Official Google Blog post, Deeper understanding with Metaweb. Metaweb also announced the acquistion in their post, Metaweb joins Google

Metaweb has a number of patent applications assigned to them at the United States Patent and Trademark Office, and they are worth diving into if you want to learn a little about some of the technology behind the company.

I’ve just started looking at them myself, beginning with the one below on “Query Optimization,” which is where I found the Metaweb ID number of Arnold Schwarzenegger. The patent filing describes how an ID number can be used to collect and store data about named entities, and information associated with them, and how queries can be performed based on that collected information.

Here are the patent filings assigned to Meta Web:

Automated online purchasing system
Invented by W. Daniel Hillis, Bran Ferren
US Patent Application 20030195834
Published October 16, 2003
Filed: September 18, 2002

Meta-Web
Invented by W. Daniel Hillis, Bran Ferren
US Patent Application 20040210602
Published October 21, 2004
Filed: December 15, 2003

Personalized profile for evaluating content
Invented by W. Daniel Hillis and Bran Ferren
US Patent Application 20050131918
Published June 16, 2005
Filed: May 24, 2004

Delegated authority evaluation system
Invented by W. Daniel Hillis and Bran Ferren
US Patent Application 20050131722
Published June 16, 2005
Filed: May 25, 2004

System and method to facilitate importation of user profile data over a network
Invented by W. Daniel Hillis and Bran Ferren
US Patent Application 20060095780
Published May 4, 2006
Filed: October 28, 2004

User Contributed Knowledge Database
Invented by Timothy Sturge, Kurt Bollacker, Robert Cook, John Giannandrea, Nicholas Thompson, Edwin Taylor
US Patent Application 20090024590
Published January 22, 2009
Filed: April 22, 2008

Graph Store
Invented by Scott Meyer, Jutta Degener, Barak Michener, John Giannandrea
US Patent Application 20100174692
Published July 8, 2010
Filed: January 20, 2010

Database Replication
Invented by Scott Meyer, Jutta Degener, Barak Michener, John Giannandrea
US Patent Application 20100121817
Published May 13, 2010
Filed: January 20, 2010

Query Optimization
Invented by Scott Meyer, Jutta Degener, Barak Michener, John Giannandrea
US Patent Application 20100121839
Published May 13, 2010
Filed: January 20, 2010

Knowledge Web
Invented by W. Daniel Hillis and Bran Ferren
Assigned to Metaweb Technologies, Inc.
US Patent 7,502,770
Granted March 10, 2009
Filed April 10, 2002

Conclusion

Metaweb operates the community based site Freebase, which is a community-based source of data about different people, places, and things. For a great example of how they collect and display data, see their page on George Washington.

What will Metaweb bring to Google?

That remains to be seen, but it’s possible that Metaweb’s technology might help make it easier for Google to associate information with named entities. As the Microsoft paper I mentioned above noted, searches for named entities make up a good percentage of searches on their search engine. Chances are that searches for named entities are fairly popular on Google as well. So the impact of the Metaweb acquisition could potentially be a large one.

View the Original article

Categories : SEO Tips
Comments Comments Off

Head URLs and Tail URLs and Bing's Supplemental Index?

Wednesday, August 4th, 2010

Blogging GuideJuly 19, 2010 at 9:36 am

I agree with your last sentence, search engines can improve, update things here and there, undergo upgrades and many others as long as search results quality, efficiency and effectiveness is not affected but rather improved for the better.

Bill SlawskiJuly 19, 2010 at 2:44 pm

Hi Andrew,

Sometimes there is a tradeoff, where some rankings for some sites may suffer, or there might be unintended consequences that cause changes in rankings.

If you find your rankings for your pages changing unexpectedly, sometimes that might be a result of something that you have done, or that your competitors have done. And sometimes it might be because of a change implemented by one of the search engines.

SearchCap: The Day In Search, July 19, 2010July 19, 2010 at 5:02 pm

View the Original article

Categories : SEO Tips
Comments Comments Off

The Importance of the Journey: Search Trails and Destination Pages

Wednesday, August 4th, 2010

By Bill Slawski, on July 20th, 2010

Two Microsoft papers being presented at this week’s SIGIR’10 conference in Geneva, Switzerland explore the topics of Search Trails – The pages that a searcher travels through after performing a search for a query before reaching a final destination page.

The idea of delivering searchers to a final destination page, a page where previous searchers for a specific query often end up at before they either stop searching, or changed the focus of their search, is something that Microsoft has explored in the past.

I wrote about a patent filing from Microsoft a couple of years ago which explored how user behavior signals, such as how searchers browsed through pages to find information might be used to rerank search results. The post, Search Trails: Destinations, Interactive Hubs, and Way Stations, took a look at how search trails – the pages browsed between an initial query and a final page visited, might offer useful query suggestions to searchers as well.

That patent filing, and the 2007 SIGIR best paper, Studying the Use of Popular Destinations to Enhance Web Search Interaction (pdf) by Ryen W. White, Mikhail Bilenko, and Silviu Cucerzan, focused more upon the final destination pages found than the pages visited along the way. Ryen White is listed as a co-author in the earlier papers and patent filing on search trails, and he is one of the authors listed on the papers presented this week in Switzerland as well.

It looks as though those intermediary pages may have some value as well, and the idea of including those somehow within search results may be worth exploring.

What route do searchers follow to get to a final destination page, and how important are the pages along the way? Might other searchers with similar information needs and situational tasks to fulfill benefit from a search engine showing the search trails that other follow? How would those trails best be shown in search results?

The papers are:

Assessing the Scenic Route: Measuring the Value of Search Trails in Web Logs (pdf) by Ryen W. White and Jeff HuangStudying Trailfinding Algorithms for Enhanced Web Search (pdf) by Ryen W. White, Jeff Huang and Adish Singla

The authors of the papers tell us that ranking documents for specific keywords may be an easier task than helping people who have more complex informational needs. Those needs can include learning about a topic that someone doesn’t know much about, and may want to become better informed.

Some of the search behaviors cited in the papers about how people might start with one search, and travel through a number of pages after their initial query describe patterns used to meet those needs under names such as information foraging, berry picking, and orienteering.

If you’re not familiar with those concepts, they are definitely worth exploring if you’re interested in learning some theories behind how people perform complex searches. The following are cited in the two recent Microsoft pages:

The Design of Browsing and Berrypicking Techniques for the Online Search InterfaceInformation ForagingOrienteering in an Information Landscape: How Information Seekers Get From Here to There

Conclusion

Many search related papers about searchers’ behaviors focus upon query sessions – where searchers may start out with one query term or phrase and possibly perform additional searches adding words to make their search more specific, or removing words to make it more general, correcting spelling mistakes, or even switching over to related terms.

Some query session refinements include people adding geographic terms to their queries, which may indicate to a search engine that specific queries have geographic intents behind them. If those refinements happen frequently enough, they may trigger maps showing up in search results. Other query session refinements may help power some of the “Did you mean” type suggestions that you sometimes see when you search.

Looking at Search Trails may provide a whole different range of searcher behavior type information. By studying the pages that people travel down, from their selected page amongst search results to a final destination page, there may be information that can be related to that initial query that just isn’t captured by looking at query sessions and refinements alone.

Will Microsoft start showing some search trails that are often followed by searchers for specific queries in their search results?

It’s a possibility. It might be an interesting addition to the search results we see today, and could benefit people who don’t know too much about a specific topic, but are interested in exploring it more fully than someone else who may just be looking for a quick and simple answer in their search results.

View the Original article

Categories : SEO Tips
Comments Comments Off

How and Why Google Might Estimate the Number of Users Behind an IP Address

Wednesday, August 4th, 2010

By Bill Slawski, on July 25th, 2010

When you arrive a web page, the owner of that page might start collecting information about your visit for a number of reasons. One of the most commonly collected pieces of information is an internet protocol (or IP) address. An IP address is a number that can be associated with the way and the place that you access the Web.

The Difficulties of Using an IP Address as a Data Point

Your IP address might be assigned to a server or a router that you use to connect to the Web, or a proxy server or firewall that stands between the computer that you are using and the rest of the internet. You might go online on a computer that you share with other people at home or at a public place like a library, or at an office filled with other computers. You might share an IP address with roomates or family on the same computer, or use more than one computer through the same IP address.

A unique IP address might be assigned to your internet access everytime you dial into the internet, or may be leased by your router on a weekly basis through your broadband provider and may change if that lease isn’t automaticaly renewed by logging in within a certain amount of time after the lease period is over. If you access the web through an office, your IP address that can be seen by the pages you visit might be that of your company’s firewall.

When you connect through a service such as AOL, you may share an IP address with many other people because you connect to the Web through a proxy server which may cache pages visited by others – so that if you visit a page that someone else has seen recently, you may see a cached copy of that page stored in the proxy server instead of visiting the server the page was published upon initially.

A patent granted to Google this week describes how the search engine might be able to estimate the number of people who might be accessing the web through individual IP addresses, using a number of different approaches.

Why a Search Engine Might Estimate Users Behind an IP Address

Why would Google want to be able to estimate how many people might be behind a single IP address and be able to distinquish between them if possible?

The patent tells us that there are a number of reasons – being able to estimate how many visitors come to your site from IP addresses can be useful in determining:

Whether or not visits to a page from one or more IP addresses are from a single user, or from multiple users;Whether or not ads selected from one or more IP addresses are from a single user or from multiple users; Whether or not server resources from one or more IP addresses are from a single user or from multiple users;How many user access a Web page or Website;How many users viewed certain ad impressions;How frequently users visited pages.

There are a number of ways that Google might use this kind of information. For example, Google collects information about user-behavior during query sessions, so that they can see how a searcher might modify their queries when searching for information about a specific topic to try to understand the intent behind a search or a series of searches. There are a number of features that Google offers that benefit from being able to distinquish between different searchers, collecting that data from a large number of searchers, and aggregating and analyzing that information, such as:

Spelling correction suggestions,Query refinement suggestions,Determining whether there is a geographical location intent behind a search (and possibly showing maps and local business suggestions),Personalization or customization of search results, andDiversification of search results

Being able to understand which searches and other interactions with Google originate, from IP addresses, and specific users behind IP addresses can also be useful in:

Trying to determine if click fraud is happening,Determining whether searches, and clicks, and other interactions with search results and advertisements might be automatedDeciding whether searches, and clicks, and other interactions with search results and advertisments might be manual but evidence an intent to manipulate user-behavior dataProviding data to users of public tools from Google such as Google Analytics, Google Website Optimizer, Google’s Conversion TrackerAnalyzing trends in searches, for use with tools like Google Insights for Search, Google Trends, Google Trends for Websites, and Google Hot TrendsAnalyzing trends for internal Google processes that might determine how popular (or bursty) some topics and some web sites might be, including news and blog resultsDetermining how popular a web site or advertisment might beDetermining how “Sticky” a site isCollecting user-data to determine which sitelinks to show for a siteRunning many other processes that rely upon distinquishing between individuals to track and measure user-behavior data

While Google uses information found on web pages and in links to web pages to determine the rankings of pages in search results, a number of patent filings and white papers over the past few years hint strongly that Google is also looking closely at user-behavior data to determine how much attention people are paying to different web pages, videos, news results, blogs, and other kinds of documents or objects on the Web. That public attention level may influence how well some sites may rank for different queries.

Cookies and Other Client Identifiers

When you visit a web page, the server it is on may send textual information to your browser, which is sent back to that server every time you access that page. This information is know as a cookie, and it can be used to authenticate your identity (so that you don’t have to log in everytime you visit a site), as well as for user tracking, and to maintain specific information such as your preferences for a web site, and what you’ve entered into a shopping cart, and more.

But there are people who purposefully disable cookies on their browsers to avoid being tracked.

Cookies can be part of the solution of estimating how many people might be accessing the web through a specific IP address, but there are other approaches that can help when people have turned cookies off on their browser.

The patent refers to cookies and information about your browser and some other computer settings as client identifiers.

These browser parameters and “user-agent” parameters can include things such as:

Screen setting information such as screen height/width, available height/width, and color depth,Time zone,History length,Whether or not Java is enabled,Number of plug-ins,Mime types,Type of connection device or program connecting to the web, whether desktop or mobile browser, screen reader or braille browser,Host operating SystemLanguageetc.

So, an estimate of the number of users who might be at a specific IP address could be created by looking at a ratio of unique sets of user agent and/or browser parameters for that IP address. Information about browser and other client parameters could be used to “differentiate different users.”

The Google Patent is:

Determining a number of users behind a set of one or more internet protocol (IP) addresses
Invented by Deepak Jindal, Rama Ranganath, Gokul Rajaram, and Fong Shen
Assigned to Google
US Patent 7,761,558
Granted July 20, 2010
Filed June 30, 2006

The patent provides a fair amount of detail on how they might attempt to analyze both cookie and client identifier data to estimate how many different users might be accessing the web at different IP addresses, and some of the assumptions and rules that they might use within that analysis. Some examples include:

A single cookie at one IP address only is most likely a single user at that address, unless the computer is shared A single cookie appearing at a mix of IP addresses may be a single user whose IP address is dynamically changed each time they connect to the Web, or who moves between physical locations when connecting to the Web.A single IP address with multiple cookies:with a small number of cookies over a period of time, could be a single user visiting through different browsers or computers, or someone who clears or resets their cookies regularlywith a large number of cookies over a period of time, could be a number of people visiting through a proxy server

Some cookies have a very short lifetime, of less than a day, and we’re told that those might be filtered out of this process because they may come from browsers that don’t accept cookies, or only accept session cookies, or are from first time visitors, or people who clear their cookies daily, or even spammers.

Speaking of spam, the patent tells us that a list of known spam proxies and IP addresses might be maintained that could be used to exclude information from estimates about how many people are behind a specific IP address.

This process could also be used to try to compile a list of suspicious IP addresses, such as an IP address which appears to have a single user behind it but an unusually large number of impressions or conversions. Such an IP address might be listed as a spam address. While the patent doesn’t describe other patterns that might associate addresses with spamming activity, it’s possible that a system like this could potentially look at many other signals as well.

Conclusion

As a searcher, or site owner, or SEO, why should you be concerned about how Google might estimate the number of users behind different IP addresses?

One major reason is that the information collected by a search engine about visits from different IP addresses may influence how a search engine like Google operates in areas such as identifying click fraud, incorporating user-behavior data into search rankings, removing search volume information from keyword tools from automated queries or from people checking rankings instead of searching for information on a specific topic or query term.

It’s also helpful to get a better sense of how much information, and what kinds of information Google might collect about people who use the tools it offers.

Turning cookies off on your browser doesn’t mean that Google might not be able to distinquish between your searches and those of someone else who may share an IP address with you – Google can and will likely use other information to get a sense of how many people are behind a different IP address that can include which browser you use and the add-ons you’ve installed upon it, the size of the window you use to browse with, your browsing preferences, and more.

View the Original article

Categories : SEO Tips
Comments Comments Off

Google's Matt Cutts | How to Get Better Visibility on Google

Wednesday, August 4th, 2010

June 25, 2008USA TODAY's Jefferson Graham interviews Google engineer Matt Cutts on how to get your site to the top of Google with 5 basic, common sense SEO tips. Matt Cutts guests on the USA TODAY Talking Tech web video show. New episodes air weekly at http://tech.usatoday.comCategory:Science & TechnologyTags:mattcuttsseousatodaytalkingtechjeffersongrahamgoogleLoading…

View the Original article

Categories : video Seo Tips
Comments Comments Off

SEO Tips & news search ranking

Wednesday, August 4th, 2010

September 01, 2009More SEO tips & tutorial. Also Google has a great new SEO news vid. If you want to rank high in Google News Search you need to see it. Twitter

View the Original article

Categories : video Seo Tips
Comments Comments Off

February 03, 2009SEO tips on how to get a high ranking in Google and other search engines from a successful website owner. See examples of search engine optimization and why Easywebsite101 is one of the best web design books and SEO tools perfect for beginners and website owners.Category:Science & TechnologyTags:seo bookshigh ranking websitehigh ranking in googleweb design booksseo toolsLoading…

View the Original article

Categories : video Seo Tips
Comments Comments Off