



Marchex has announced that today it will be launching over 100,000 new websites, totaling over 1 billion pages of content. These websites are both targeted at local searches (i.e. denverautorepair.com, 90210.com, etc), as well as specific verticals (locksmiths.com). Rather than aiming for a central portal on one domain, similar to an IYP site such as SuperPages, Marchex is hoping that these thousands of sites will each be relevant enough to pull searchers in as they’re discovered through the SERPs. However, one thing that you will notice if you go to these individual sites is that they have a very similar look and feel (which you would expect with 100,000 pages being launched at once), and that they all link, in some way, to myzip.com, which can show the exact same information as the individual pages.
Looking at their robots.txt, you can see that they are basically shutting the crawlers out of everything on myzip.com
User-agent: *Disallow: /-/home/Disallow: /-/results/Disallow: /-/detail/Disallow: /-/about/Disallow: /-/terms/Disallow: /-/privacy/Disallow: /-/guidelines/
In fact, the only page that I found on myzip.com that could be crawled was the portal page (http://www.myzip.com/-/portal/?p=Portal&). So maybe this is because they’re only having those 100,000 individual sites crawled? After all, they’d get some benefit from the urls right? Well, here’s the robots.txt for denverautorepair.com
User-agent: *Disallow: /-/results/Disallow: /-/detail/Disallow: /-/about/Disallow: /-/terms/Disallow: /-/privacy/Disallow: /-/guidelines/
Aha! A difference! They’re not blocking the /-/home directory on this site. So what is there? All of the unique content? Well… not quite, it’s the same content on each page in that directory, but the sponsored listings are different…
Marchex is looking to distinguish itself in scale and quality from the so-called “domain parking” industry that often prey on accidental visitors to their sites by serving up low-quality advertising links on random pages
From the Reuters article
…such as serving up ads for buying homes in Florida on a page about Denver car repair???

Now while it’s true that this is a relevance quirk on Yahoo’s side, (I reached this page by clicking on the “See sponsored links for: Florida” crawlable link on the site, which is populated by Yahoo), the fact still remains that Marchex is allowing these pages to be indexed . Of course houses in Florida on a Denver page isn’t the most fun example, so how about this crawlable page on locksmiths.com stuffed full of Carmen Electra ads, because when you need a locksmith, you obviously need something to take your mind off being locked out of your house / car…

Admittedly, this is a small sample that I’ve looked at, but it does look a little strange if they’re trying to distance themselves from the domain parking sites, yet the only pages they’re having crawled are those with different ad sets, especially when those ad sets may not be related to the content of the page. It could be that they’ve not yet ‘launched’ these sites fully, and the robots.txt files may be changing, so I’ll check back tomorrow and see if they’re still the same or not, but still…
**update – 2 days later – Looking at the robots.txt files for denverautorepair.com and locksmiths.com, I don’t see any change, so it looks like this is how they intend it to be.






More Options ...
Categories
Tag Cloud
Blog RSS
Comments RSS


Void « Default
Life
Earth
Wind
Water
Fire
Light 
[...] Simon Heseltine: Marchex, local search and Carmen Electra? [...]
Well here’s my take on it. If they are serving 1,000,000,000 pages and all with some sort of Ad network monetization, regardless of relevance; let’s say that for all of those pages, they get only 1/100% of the people to click once – that gives them 100,000 clicks per day. And let’s also say that they only get 0.10 per click. If my math is correct, that would give them a revenue of $10,000 per day or $3.65 million per year. The real questions are what does it take to keep that much content hosted, and will the billion pages be indexed, right? (of course, I am just musing – but it’s not a bad model either).
I am curious about the Carmen Electra connection, and while I am tempted to offer a humorous reply, it would surely be off colour, so I will just agree with your assessment.
Speaking of ad networksI am now testing the Azoogle network, to see if it will produce more than my weak Google AdSense checks.
Indeed. I didn’t say that it was necessarily a bad model, and with the scope of the effort you’ve got to think that they’ve got a good CM system and a very scalable system. As to whether the pages will be indexed, that was really the point of this piece. They’re not currently opening up the ‘next’ pages, on the sites I looked at, to the crawlers, but they are opening up the home pages with the same content and different ads. I find that to be a curious choice. They have content on the ‘next’ pages, why not allow that to be spidered as well?
Again, this could just be a scalability issue with the release robots.txt files not having been released to production in the correct form yet. Time will tell.
[...] Marchex, local search and Carmen Electra? ?such as serving up ads for buying homes in Florida on a page about Denver car repair??? Now before anyone says that this must be a quirk on Google?s side, I reached this page by clicking on the ?See sponsored links for: Florida? … [...]
Seems to me that they are shooting themselves in the foot. Unrelated ads on a “content” page with the same content found someplace else just seems like a bad idea. The more pages of unique content indexed the better.