I’ve gotten access to the blo.gs ping cloud finally and am hoping to learn some things from the data. Some quick background: most blogging packages have the capability of notifying a remote server every time they are updated. One such service is the blo.gs service which is now owned by Yahoo. blo.gs forwards the pings onto any listeners connected to the ping cloud, hence my interest. Search engines like Technorati or PubSub, for example, all leverage the various services. They are even all trying to work together to share pings amongst each other via an effort called FeedMesh.
I finally wrote a simple logger to connect to the cloud and write out the pings to disk. It’s a goofy little 50 line perl program that basically just dumps the strings to a file while appending the local time to the ping tags. I’m dropping the pings to a file so that the service I write to process and index the pings is decoupled from the reading of pings. It would be good, for example, if one didn’t slow down the other. Of course, I’m running all of this on my iMac which is doing other stuff, so we’ll see how that works out. Might be time to buy a Linux box.
There are a lot of pings coming in each second. The first minute I was connected I got 980 pings and that’s at around 1:30AM ET when only dorks like me are blogging. I’ve learned quite a bit already, even though I’ve only started to peruse the data. There’s a lot of SPAM blogs (splogs) for one thing. Most surprising to me, though, is that non-blog services ping the cloud, too. For example, Flickr, Craigslist, Topix.net, and a number of other services are pinging the service.
So, now the fun comes. How do you handle then crawl dozens of pings per second? How do you identify spam so as to avoid recording them entirely? Ping-O-Matic has recorded days where they averaged 69+ pings per second. And that’s assuming that pings were spread evenly throughout the 24 hour period… not likely, but remotely possible given the global scale of things. So, it’s likely that the peak periods during the day are far above 70-100 pings per second, especially on heavy blogging days.
I’ll leave this up and see what I get over the next few days. I might even write a more robust client. This is going to be a neat experiment.
(and, I get to see how long it takes before this post shows up
)





Leave a Reply