FairWinds Partners, LLC
FairWinds Partners, LLC
FairWinds Partners, LLC
print

The Cost of Typosquatting

Volume 5, Issue 2 | June 23, 2010

METHODOLOGY

1. Compiling a List of Typosquatted Sites

To begin gathering data, we first looked at Quantcast’s1 list of most highly trafficked websites. Starting with the most highly trafficked site, we measured each domain name against a set of criteria for inclusion in our study. The first 250 domain names that met our criteria became our base data set. The criteria for inclusion were as follows:

  • Domain Tools2, the typo spinning software we used to generate the typos for the study, offers all results in dot-COM/NET/ORG/BIZ/INFO/US. As one would expect to find in such a study, the majority of registered typographical variations, 74 percent in total, fell under the dot-COM extension.
  • The domain name had to be at least six characters in length. Requiring that domain names in a data set contain a minimum of six characters helps to decrease the chance that a typo of a target domain is a different correctly spelled domain.

Based on Internet user behavior, we know that there are instances where direct navigators will remove hyphens from brand names when turning that brand name into a domain name. For example, Internet users searching for the Merriam-Webster Dictionary online may type in merriamwebster.com rather than merriam-webster.com. Many Internet users will likewise add hyphens to the domain name if the brand itself contains or once contained hyphens. For example, while Wal-Mart Stores Inc.’s most frequently communicated domain name is walmart.com and the company has recently removed the hyphen from its brand, many will still type in wal-mart.com. We identified five of these domain names on our list and included their hyphenated or unhyphenated counterparts in the study as well. As a result, our list of 250 became a list of 255. Once we settled on this list of 255 names, we recorded each registered typo of these domains across the more common gTLDs—.COM, .ORG, .BIZ, .INFO, .NET—and .US. This produced an initial data set consisting of 32,836 registered domains.

2. Projecting Traffic to Each Website

Using FairWinds’ proprietary traffic calculation method, we determined the annual traffic numbers for each of these domain names.

3. Examining Website Content

We recorded the registration data for these domains and based on the registrant and registrar, labeled them as follows:

  • Domains owned by the same brand that owned the target domain were labeled “Brand Owner” domains. These domains are legitimately owned and should either not factor into an analysis of the cost of typosquatting (if the brand is using the domain name to point to relevant content) or should be evaluated as a different type of harm that the brand has inflicted upon itself (if the domain is owned by the brand owner but not being used properly).
  • Domains owned by a brand owner other than the owner of the target domain—most likely due to the fact that a typo of the target domain is another brand—were labeled as “Other Legitimate Owner” domains. Requiring that the target domains have a minimum of six characters reduces these occurrences, but it does not eliminate them entirely.
  • Finally, all of those names that did not belong to a brand owner fell under the umbrella of “Potential Squatter” domains. “Potential Squatter” domain names fall into one of two categories—either the identifying information regarding the owner is hidden or the information regarding the owner does not belong to a brand that the infringing domain targets. Under these circumstances, we know with certainty that the domain is not owned by the brand, but we do not know whether the domain is being used for a legitimate purpose (such as an opinion site) or for cybersquatting.

Once examined, this group of potential squatter domains—just over 28,000 domains, or about 85 percent of our original data set—would provide us with information on the losses incurred by brand owners as a result of typosquatted domains.

Each Potential Squatter domain has a target domain—the target domain is derived from the proper spelling of the brand. Each Potential Squatter domain also has a Potential Squatter behind it—the person who registered the infringing domain. In order to determine the content hosted on each of these domain names (from the data set of 28,000 names), we examined the content of 20 percent of the domains owned by each Potential Squatter for each target domain. These domains were chosen randomly, and the content of each domain was labeled as one of the following:

  • Pay-Per-Click (PPC): PPC websites display a collection of sponsored links, usually pertaining to the keywords contained within the domain. A domain name that contains typos of a brand could resolve to a PPC page that may contain links to that brand, links to the brand’s competitors, and links to related sponsored advertisements.
  • Affiliates: Some brands offer affiliate programs, which allow third-party website owners to post the brands’ links and banners on their site or to send traffic to their site directly through domain redirects; in return, the owner of the site that is hosting the link receives a commission for every click-through that results in a purchase, sign-up, etc. While it is usually in breech of an affiliate program agreement, some cybersquatters plug into affiliate programs by using brand typo domains.
  • Does Not Resolve (DNR): These domain names did not resolve to any content at the time of our review. It is possible that the site was simply down temporarily or that these sites continuously exist without content.
  • Infringing Content: These domain names resolve to content similar to that of the target brand, such as “whitepages.net” resolving to a third-party phone directory site.
  • Other: This category captures domains used for a variety of purposes, such as hosting content for contests, blogs about a brand or product, or a registrar/hosting provider’s “coming soon” information.

After initially examining the list of 28,000 and marking domain names housed on Domain Name Servers (DNS) known for hosting PPC sites as “PPC sites”, there were still thousands of domain names to be examined. So, we looked to see if there were any patterns in the DNS that hosted these domains. We examined 20 percent of the total domains housed in each remaining DNS—if 20 percent of domains on a particular server resolved to only one type of site (PPC, Affiliate, etc), the entire group of domains from our data set that were housed on that server were labeled as that type. Using this process, we were still unable to classify 8,000 of the 28,000 Potential Squatter domains. These 8,000 domains were therefore examined further.

The content of 20 percent of these remaining 8,000 domains was analyzed by first determining which of the 8,000 domains had significant quantifiable traffic. Ten percent of these 8,000, or 800 domains, received detectable traffic. We then took a random sample of 800 domains from the remaining 7,200 that did not receive detectable traffic. Based on the percentages of PPCs, Affiliates, DNRs and Others found in this 20 percent sample set, we projected the percentages of PPCs, Affiliates, DNRs and Others found in the entire population of the 8,000 originally unlabeled domains.

After these calculations, we determined that 23,374, or 84 percent of Potential Squatter domains resolve to PPC sites. Affiliate domains account for 5 percent of the Potential Squatter domains, while six percent did not resolve, three percent hosted “Other” content, and 2 percent resolved to infringing content.

Graph 1

Domains Belonging to Potential Cybersquatters
[1]

"Quantcast US Site Rankings." Quantcast. Web. www.quantcast.com