Lyte's Blog

Bad code, bad humour and bad hair.

WTF Is BLP_bbot?

At the time of writing this no one else I could find via Google seemed to have much clue specifically what BLP_bbot was either.

A bit of background

I spent the best part of a day trying to get a web server that was continually crashing to not crash quite so often. The first thing that was noted by Bill (not a real name) the guy I was working with was that the reverse proxy Squid instance that sits in front of the Apache instance was MISSing for a huge number of requests. After a bit of digging I found a pattern to the MISSing URLs, they all seemed to be */{up,down,left,right}.gif type paths, at first I thought someone had figured out that it was taking this server 45 seconds + to generate a 404 (I know I know, it’s WAY too long… but apparently it’s always been that way and I was only asked to examine and fix what was making it crash right now, not actually fix the whole thing up) and was purposefuly bringing us down. It turns out someone had uploaded some bad JavaScript 2 years ago that was generating all the 404s, looks like maybe the traffic level or amount of info the dynamic content has to evaluate has hit some magical threshold.

Anyway long story short, I fixed the JavaScript (from a Sys Admin point of view, it still didn’t actually work :p ) watched the server for a bit, the load went down by HUGE amounts from where it started. It seemed ok, so I went home.

So what is BLP_bbot???

Well the next day I got told the server was much better but it had still crashed once over night (crashing in this case was the cronned fail script picking up that Apache was stuck and restarting the service, killing all crud procs). I started watching Apache’s Server-Status page again. We were very slowly ramping up dead requests on a /news URL. A dig through the Squid access logs revealed that most of the requests for /news were from a BLP_bbot. Stracing the dead procs showed they were trying to get a futex and deadlocking, I thought maybe the bot was sending through a non-standard request and that was some how failing our service (I was wrong).

I spent a while trying to figure out what the bot actually is as most results in Google are just people asking “wtf is BLP_bbot?!@#” and others replying with answers to an unasked question about how to block it.

Geo-IP information was very revealing, showing that the ISP the IPs belong to is “Bloomberg Financial Market”. After a bit more digging I found that Bloomberg do indeed refer to themselves as “BLP”.

Chances are if I blocked the BLP_bbot, some share holder some where would suddenly start losing money. It’s unlikely to be documented well anywhere, they probably don’t want anyone to really know for sure, but chances are it’s used for some sort of autonomous financial trading. If I blocked it I might unwittingly tip its metrics in favour of dumping stock for whoever I happened to be working for that day. Never a fantastic career tactic, so I chose to avoid it.

In short, if BLP_bbot is crawling you it’s probably because you’re actually important enough in an economic sense to be worth looking at and if you block it, I would expect bad and unintended results.

For those that care

So at this point I’d spent an hour or more wasting time hoping that this bot was sending malicious packets and I could just block it, making the problem would go away. Turns out I had an interaction problem between PHP’s memory_limit being exceeded and SHM in eaccelerator though. I had a wild stab in the direction of eaccelerator because I started to think about what might make a futex call and for a while I was stumped. We’re not using them directly in PHP, PHP shouldn’t be using them for anything except possibly session sharing (but I know a session deadlock happens on flock calls because we stores sessions on disk), Apache shouldn’t be using them under the mpm model we’re employing so the only thing left was eaccelerator, or maybe pgsql bindings but we had persistent db connections off so that seemed really unlikely (also disabling pgsql bindings would cause our app to malfunction just a tad more than it alread was).

After turning off eaccelerator the server became stable, slow (yeh well it was slow before, so maybe “slothful” is the right term), but definitely stable.

After a bit of digging we found a bug for eaccelerator that causes deadlocks when using SHM on the version we were running, in theory it’s fixed in the current stable. So I upgrade and reenable it, the server starts crashing again. Doh. Back off.

At this point “Bill” noticed that /news wasn’t quite rendering as much HTML as it should (at the very least it was missing a tag). A bit of digging through logs revealed that we were hitting the memory_limit for that page and the proc was being abnormally terminated. Fixed that up, reenabled eaccelerator, it was stable now.

Now I have another problem, I want to figure out how to reproduce this bug so I can lodge it with the nice eaccelerator people but it only seems to exist on the production instance of this system. I can’t make it fail in the same way anywhere else. I guess I’ll just have to hope someone else discovers the same thing and documents a sequence of events to reproduce it… I’m not holding my breath.