This is the Message Centre for Jim Lynn

Unleash

Post 1

Jim Lynn

When I worked at Computer Concepts, and we wrote applications for the Acorn Archimedes, people would often remark on their speed and ease of use, and wondered what techniques we used to achieve these results.

Someone (possibly Sean, U7, although it might have been Tim, U1) suggested we write an article in our in-house magazine revealing that our secret was that we used the secret, undocumented 'unleash' command in the ARM processor which unleashes the raw power of the ARM. But we never did.

I feel slightly as if we've just used the 'unleash' switch on the DNA servers. I don't know if anyone's noticed, but in the last few days, the servers have been running really, really fast. I compared logs from yesterday to logs from two weeks ago, and there's been a three times improvement on response times - average response time has dropped from 2.13 seconds to 0.77 seconds. And, somewhat paradoxically, over the same period the number of requests has tripled (due to all the new messageboards moving to the DNA servers).

There's a slightly frustrating aspect to this, though. It goes back to when we moved to using Single-Sign-On. For SSO we have to talk to the membership database to recognise you each time you fetch a page, which requires a connection to the SSO database. When SSO launched, we had 'connection pooling' code, which would keep a pool of these connections, and reuse them as necessary, but on activating the service, the SSO database promptly fell over because it couldn't cope with all the connections the DNA servers were creating and holding open. So, just to get the site back up and running, we disabled the connection pooling, so that we'd only create connections when we needed them. This solved the problem, and the site was back up and running.

However, I kept noticing that the connections to SSO would occasionally take a long time - several seconds. Not all the time, but enough to be annoying. But so far, it hasn't been a problem.

Recently, however, we've been worrying a lot about performance, so we put in some more monitoring, and discovered something rather odd. The graphs were showing a strangely regular pattern - a flat line for a time, then a spike, as requests were building up on the server, then a sudden drop as all those requests finish almost simultaneously.

We then put together a test program which simply made and broke connections over and over again, displaying how long it took. And it became clear that there was this regular delay. Every 16 seconds it would wait for about 5 seconds before completing the connection. Every 16 seconds.

Now, this is slightly worrying. We're days before moving the BBC's biggest and most fearsome messageboard onto DNA - 606, a sport board, which, going by its current traffic, was likely to double our load. This, on top of these SSO delays, was threatening to cripple us.

So we went back to the connection pooling code we'd written originally, made sure it still made sense, set the pool size to something reasonable so as not to overwhelm SSO (they'd increased their maximum number of connections several times since launch so we were happy we wouldn't have a repeat of the first time).

I tested the change on the staging server, it worked OK, so, very late at night (when server load is light) I put these changes live.

The result was dramatic. Suddenly, pages were returning immediately. All of them. Where there used to be delays, now pretty much every page was coming back immediately.

The graphs were showing that all the spikes we'd been seeing were gone. Which was nice. But that was late at night with quite low traffic. Would it still behave that way in the morning? Would it somehow be worse when the relentless hordes of footballdom descended later in the day?

It's been fine ever since. We watched Who's Online hit 1005 on Wednesday with no noticeable effect on speed. Then it got close to 1400 yesterday and only slightly less today, still with the servers zipping along.

I don't want to jinx it, since 606 haven't fully moved across yet, so there might be more load to come, but right now, it's looking quite promising. Even at our busiest time, the database server is only running somewhere between 5 and 10% CPU capacity. And all this is without applying all the database optimisations which (according to my tests) could speed up the database by a factor of two or three.


Unleash

Post 2

Mikey the Humming Mouse - A3938628 Learn More About the Edited Guide!

YAY for Jim!

I admit, when I finally get a massive dabtase query to the point where it is suddenly debugged and running smoothly, I'm often afraid to run it again just in case it doesn't work the same the next time....

Of course, I'm working with far crappier servers than y'all, I would bet.


Unleash

Post 3

Kat - From H2G2

I had noticed that pages and searches were coming back faster. I couldn't decide if it was because my internet was being cooperative or if something sneaky was being done to the server. I'm, admittedly, slightly disappointed to find it's not my internet and it won't be faster everywhere now smiley - biggrin

Well done for making us all zip along! Let's hope complaints about other things don't start flooding in! smiley - cheerup

Kat


Unleash

Post 4

Jim Lynn

It's still not perfect. There are still other things that can cause problems. Search, for example, still has an occasional tendency to hang up, leading to hang-ups across the system, and if SSO decides to go slow, that affects every page. So there's still plenty to do.


Unleash

Post 5

Frankie Roberto

Indeedy, well done Jim. The sites do seem to be running pretty speedily to me.


Unleash

Post 6

Jimi X

Good on ya' Jim!

So far so good eh? I saw 1600 online just a tick ago and no major problems so that something innit?


Unleash

Post 7

Jim Lynn

It's rather mind-boggling, I have to say. The database is currently handling around 530 stored procedures a second, which is slightly scary because when I load tested it, I couldn't get it to do more than around 300sp/s. But there are a few differences between the way we're talking to the database and they way I load tested it which could explain the discrepancy.


Unleash

Post 8

Whisky

You've created a monster Jim smiley - winkeye

If you log on tomorrow morning and your workstation insists on calling you Professor Falken and asks you if you want to play a game - I'd suggest running like hell!

smiley - run


Key: Complain about this post

More Conversations for Jim Lynn

Write an Entry

"The Hitchhiker's Guide to the Galaxy is a wholly remarkable book. It has been compiled and recompiled many times and under many different editorships. It contains contributions from countless numbers of travellers and researchers."

Write an entry
Read more