How does Google see your website?

How does the spider see?

Animals have different ways of seeing than humans. Dogs are thought to not be able to see in color.  The compound eyes of flies are particularly interesting.  At one point scientists thought the fly's compound eye resulted in many views like a wall of TV screens all tuned to the same channel.  Today, they think it is a large screen like a display at a baseball stadium, each eye relaying a portion of the larger image.  Spiders have 8 legs but they also have 8 eyes. Some spiders see quite well. You can read more about how spiders see here. 

But this post really isn't really about  dogs, or flies, or spiders.  It's about the Google spider, and how it see's your web site. 

How does the Googlebot see your website?

Googlebot is the Google digital "spider". It is software that visits your webpage.  It will read your robot.txt file, grab your sitelist.xml file, and perhaps "crawl" your links. 

An Aside about Crawling

An aside about crawling. Years ago I surprised a college professor by trying to engage him in a discussion about a mistake in his thesis published many years previous to my enquiry. He was much more interested in how I had obtained his thesis than he was about the mistake, (which was really minor in nature). 
It turns out he had stored documents on a server and his IT department had not excluded the location. Google had indexed all documents on the server, including the pdf version of his thesis which he placed in a folder long forgotten.  

Whereas he and his IT group had provided "food", unintentionally,  for the spider, we bloggers have a different problem. We are hoping to attract the spider and interest it in indexing our content. 

 Text Is The Most Important Part

We humans spend a great deal of time trying to entertain our human readers with graphics and video and photographs, but the digital readers, the spiders, are much easier to entertain.  Ultimately, the content that is important to Google's spider is text. 

Longer Posts are Better

The Googlebot needs a nice long book to read.  The content needs to be original, by that I mean it should not be plagiarized from other websites. Sure, it can talk about the same subjects as other websites, it just cannot be the same pages, paragraphs, and sentences.  

Your Content Should be Original 

Just like writing a high school or college paper, your content ought to be your own. 

Automated Plagiarism Checkers are Available

 If you are buying content from others, you can also pay for a plagiarism checker software to scan the web to make certain you are not being sold content from wikipedia, for example.)

Google Reveals What You Need 

Google does not hide how it see's your website. Google Search Console, which is available to anyone who owns a  website, has many tools to assist the website owner in understanding how Google sees your website. 

Stay Away from Black Hat Techniques

At one time, it was possible to rank highly in Google by using "black hat" techniques. For example, one could have an agreement with other webmasters to link to each other's content.  At one time, all backlinks were good links and would elevate your ranking  Today, Google knows there is little reason for a site in Moldova to be linking to a blog about weblog writer's blog.  There are no dishonest tricks that are likely to help you to rank quickly. 

Understanding What Google Wants Will Help

Tricks won't help you, but understanding the Googlebot will. This blog would currently fit in a "thimble". When you distill the site down to what the Google spider is reading, you can see that there's not much content here. I haven't published enough content yet. It is not a book yet. It's an article, and not a particularly long article. If you're starting out on your website, your website likely belongs in the same 'club'. It is likely just not "all that" you think it is.

Use a Googlebot Simulator

So to emphasize how Google see's your pages, you'll need a help of a simulator website. There are several from which to choose. They go by different names, e.g. "search engine simulator", "googlebot simulator", "search spider simulator", etc.  There purpose is to reduce your website down to text, tags, and keywords or phrases.  Here are some of the one's I've tried.

Three Search Engine Simulators

I recommend these simulators. If you have others you like, let me know in the comments.

