Home

Unimportant Words Impact Relevancy

Keyword Strategies by Mark SpragueIn the search world there are two categories of words that provide little or no value to your company when you are trying to optimize your website for organic search results.

      • Stop words
      • Unimportant terms identified in the search engine indexing Filtration Process.

The famous phrase to the left has only one term that will survive the filtration process (the word Question) – the rest of the terms are assigned no value during the scoring process.  To understand this it’s useful to know that a page goes through five phases to produce a score for terms on that page. These are:

  1. Markup and Format Removal
  2. Tokenization
  3. Filtration
  4. Stemming
  5. Weighting

Producing a Term Score

The first two parts (markup, format removal and tokenization) are known as the Document Linearization process, and this is where formatting and page markup codes are removed from the web page to include all HTML, CSS instructions, scripts, comments and other special formatting codes. During tokenization the page is reduced to a list of terms, then a second level of processing is done to insure that terms are lower-cased, punctuation is removed and other rules are invoked.

Filtration

The third step known as filtration is where all the terms are divided into two groups:

  • Important Terms
  • Unimportant Terms

Important Terms

The task is to identify those terms that can be used in future weighting and relevancy calculations. These terms must meet two criteria:

First, the term must describe what the document is about (e.g., Einstein, theory, relativity, speed, light, mc2, energy, gravitation, black holes, physics, cosmic microwave, time and radiation).

Second, the term must differentiate the document from other documents in the document database (in the second test; the terms speed, light and time exist in far too many context so they can’t be used to differentiate documents).

Unimportant Terms

Call Lexington eBusiness ConsultingThe task is to identify frequently used words that provide no value during weighting. In the following list you will find the top 25 most common terms in the English language, and you can see why they provide no value:

the, of, to, and, a, in, is, it, you, that, he, was, for, on, are, with, as, I, his, they, be, at, one, have and this

As you work your way down the list you began to see terms that are commonly found in keyword phrases, company names, product names, taglines, URL strings and in ad copy.

You will see these words used everywhere – they are on the list of the 500 most common words in the English language.

Hot, boat, cause, boy, home, hand, large, big, white, children, music, book, mark, feet, rain, eat, fish, mountain, north, wood, paper, war, river, car, color, friend, horse, watch, love, money, road, map, machine, star, online, web, men, animal, mother, house, father, school, family, black, rock, moon, foot, gold, city, tree, door, king, language and game. 

Because they are so common, they don’t provide much SEO value from a relevancy perspective because they fail to satisfy the second condition for identifying an important term – they won’t differentiate documents from each other, and accordingly will not receive a weighting value during indexing.

A second category of stop words are terms that have many meanings and appear in many contexts – prepositions fall into this category, but so do some adjectives such as the color red. The word Web, online and Internet also fall into this category, as they appear within the context of every subject and every market segment in the world.

Observation

Of course this is a technical discussion about the nature of how search engines parse and try to understand what your page is all about. In practice publishers tend to target two and three word combinations that can produce increased importance. For example, if you take two common terms for the above list – Music and Map – you find the following information associated with the terms.
  • Music: 9.5 billion search results
  • Map: 5.9 billion search results
  • Music Map: 1.97 million search results
  • MusicMap: 70K search results

These terms are so common that they produce staggering numbers of search results – even with the two-word combination. However, producing a new concatenated word (MusicMap) reduces the universe to 70K results. This results in a term that is potentially more valuable, but you will have to put the time and money into branding it.

Find Out More

To find out more User Intent, Keyword Research and other important SEO Topics send me (Mark Sprague) an email at: Mark@MSprague.com, or call me now at: 781-862-3126

Lexington eBusiness Consulting

About Lexington eBusiness Consulting

Mark Sprague’s 25 years of product development experience, which includes expertise in Search Engines, Information Products, SEO platforms and Social Networking applications, provide in-depth expertise to help you refine products and services, and improve your search engine marketing and website’s performance by:

  • Developing a superior data-driven SEO strategy for your website.
  • Understanding your customers’ search behavior and normalizing it to your content strategy.
  • Understanding how search engine technology practically impacts SEO and content strategies.
  • Understanding how search technology impacts content in a social networking environment.
  • Developing a superior user experience based on sound information architecture, usability and coding standards.

Lexington eBusiness Consulting
LinkedIn Company Profile
LinkedIn Personal Profile
Twitter

Lexington eBusiness Consulting

Lexington eBusiness Consulting
Mark Sprague,  CEO
580 Lowell Street
Lexington, MA 02420

List of Lexington eBusiness Clients

Lexington eBusiness Clients

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: