At the time of writing this no one else I could find via Google seemed to have much clue specifically what BLP_bbot was either.
A bit of background
So what is BLP_bbot???
Well the next day I got told the server was much better but it had still crashed once over night (crashing in this case was the cronned fail script picking up that Apache was stuck and restarting the service, killing all crud procs). I started watching Apache’s Server-Status page again. We were very slowly ramping up dead requests on a /news URL. A dig through the Squid access logs revealed that most of the requests for /news were from a BLP_bbot. Stracing the dead procs showed they were trying to get a futex and deadlocking, I thought maybe the bot was sending through a non-standard request and that was some how failing our service (I was wrong).
I spent a while trying to figure out what the bot actually is as most results in Google are just people asking “wtf is BLP_bbot?!@#” and others replying with answers to an unasked question about how to block it.
Geo-IP information was very revealing, showing that the ISP the IPs belong to is “Bloomberg Financial Market”. After a bit more digging I found that Bloomberg do indeed refer to themselves as “BLP”.
Chances are if I blocked the BLP_bbot, some share holder some where would suddenly start losing money. It’s unlikely to be documented well anywhere, they probably don’t want anyone to really know for sure, but chances are it’s used for some sort of autonomous financial trading. If I blocked it I might unwittingly tip its metrics in favour of dumping stock for whoever I happened to be working for that day. Never a fantastic career tactic, so I chose to avoid it.
In short, if BLP_bbot is crawling you it’s probably because you’re actually important enough in an economic sense to be worth looking at and if you block it, I would expect bad and unintended results.
For those that care
So at this point I’d spent an hour or more wasting time hoping that this bot was sending malicious packets and I could just block it, making the problem would go away. Turns out I had an interaction problem between PHP’s memory_limit being exceeded and SHM in eaccelerator though. I had a wild stab in the direction of eaccelerator because I started to think about what might make a futex call and for a while I was stumped. We’re not using them directly in PHP, PHP shouldn’t be using them for anything except possibly session sharing (but I know a session deadlock happens on flock calls because we stores sessions on disk), Apache shouldn’t be using them under the mpm model we’re employing so the only thing left was eaccelerator, or maybe pgsql bindings but we had persistent db connections off so that seemed really unlikely (also disabling pgsql bindings would cause our app to malfunction just a tad more than it alread was).
After turning off eaccelerator the server became stable, slow (yeh well it was slow before, so maybe “slothful” is the right term), but definitely stable.
After a bit of digging we found a bug for eaccelerator that causes deadlocks when using SHM on the version we were running, in theory it’s fixed in the current stable. So I upgrade and reenable it, the server starts crashing again. Doh. Back off.
At this point “Bill” noticed that /news wasn’t quite rendering as much HTML as it should (at the very least it was missing a tag). A bit of digging through logs revealed that we were hitting the memory_limit for that page and the proc was being abnormally terminated. Fixed that up, reenabled eaccelerator, it was stable now.
Now I have another problem, I want to figure out how to reproduce this bug so I can lodge it with the nice eaccelerator people but it only seems to exist on the production instance of this system. I can’t make it fail in the same way anywhere else. I guess I’ll just have to hope someone else discovers the same thing and documents a sequence of events to reproduce it… I’m not holding my breath.