1. Compiling a List of Typosquatted Sites
To begin gathering data, we first looked at Quantcast’s1 list of most highly trafficked websites. Starting with the most highly trafficked site, we measured each domain name against a set of criteria for inclusion in our study. The first 250 domain names that met our criteria became our base data set. The criteria for inclusion were as follows:
Based on Internet user behavior, we know that there are instances where direct navigators will remove hyphens from brand names when turning that brand name into a domain name. For example, Internet users searching for the Merriam-Webster Dictionary online may type in merriamwebster.com rather than merriam-webster.com. Many Internet users will likewise add hyphens to the domain name if the brand itself contains or once contained hyphens. For example, while Wal-Mart Stores Inc.’s most frequently communicated domain name is walmart.com and the company has recently removed the hyphen from its brand, many will still type in wal-mart.com. We identified five of these domain names on our list and included their hyphenated or unhyphenated counterparts in the study as well. As a result, our list of 250 became a list of 255. Once we settled on this list of 255 names, we recorded each registered typo of these domains across the more common gTLDs—.COM, .ORG, .BIZ, .INFO, .NET—and .US. This produced an initial data set consisting of 32,836 registered domains.
2. Projecting Traffic to Each Website
Using FairWinds’ proprietary traffic calculation method, we determined the annual traffic numbers for each of these domain names.
3. Examining Website Content
We recorded the registration data for these domains and based on the registrant and registrar, labeled them as follows:
Once examined, this group of potential squatter domains—just over 28,000 domains, or about 85 percent of our original data set—would provide us with information on the losses incurred by brand owners as a result of typosquatted domains.
Each Potential Squatter domain has a target domain—the target domain is derived from the proper spelling of the brand. Each Potential Squatter domain also has a Potential Squatter behind it—the person who registered the infringing domain. In order to determine the content hosted on each of these domain names (from the data set of 28,000 names), we examined the content of 20 percent of the domains owned by each Potential Squatter for each target domain. These domains were chosen randomly, and the content of each domain was labeled as one of the following:
After initially examining the list of 28,000 and marking domain names housed on Domain Name Servers (DNS) known for hosting PPC sites as “PPC sites”, there were still thousands of domain names to be examined. So, we looked to see if there were any patterns in the DNS that hosted these domains. We examined 20 percent of the total domains housed in each remaining DNS—if 20 percent of domains on a particular server resolved to only one type of site (PPC, Affiliate, etc), the entire group of domains from our data set that were housed on that server were labeled as that type. Using this process, we were still unable to classify 8,000 of the 28,000 Potential Squatter domains. These 8,000 domains were therefore examined further.
The content of 20 percent of these remaining 8,000 domains was analyzed by first determining which of the 8,000 domains had significant quantifiable traffic. Ten percent of these 8,000, or 800 domains, received detectable traffic. We then took a random sample of 800 domains from the remaining 7,200 that did not receive detectable traffic. Based on the percentages of PPCs, Affiliates, DNRs and Others found in this 20 percent sample set, we projected the percentages of PPCs, Affiliates, DNRs and Others found in the entire population of the 8,000 originally unlabeled domains.
After these calculations, we determined that 23,374, or 84 percent of Potential Squatter domains resolve to PPC sites. Affiliate domains account for 5 percent of the Potential Squatter domains, while six percent did not resolve, three percent hosted “Other” content, and 2 percent resolved to infringing content.
Graph 1
"Quantcast US Site Rankings." Quantcast. Web. www.quantcast.com
< previous page | next page >