A Performance Crisis at the Caravan and Motorhome Club
A race against time to fix a new online booking system and avert a customer service crisis
Time to get online
For club members, booking a pitch had historically meant a call to the contact centre. The annual release of slots for the coming year brings a surge of people chasing the best pitches. The contact centre had become a bottleneck with members sometimes waiting days to make their booking.
This was the year that things were to change. Online bookings were the way forward. The aim was that anyone wanting to book, would be able to book on the first day of annual release, as long as there were slots available.
A new web site, a new integration layer and a modified contact centre application were all required, involving a lot of development from different teams. All needed to come together for this to succeed. What could possibly go wrong? It all needed to be ready for the annual release of slots on 8th December.
Ready or not?
The implementation was running late, with the 8th December fast approaching. There would be no quiet period ahead of the storm. However, the general feeling was that things would be ready in time.
Late in the day, it was decided that ultimate reassurance would come through one final check. The club’s IT department chose to commission a load test, just in case.
Urgent action needed
With just a few weeks to go, the club were in search of a supplier that could move rapidly and decisively and that they could have confidence in.
They approached a management consultancy, who turned to SQC because of their expertise delivering load testing services.
Broad objectives
The requirement was not trivial. The club wanted assurance that both the web front end and the contact centre application would be able to cope with the forecasted demands and more.
Four primary objectives were agreed:
- An assessment of the performance, throughput, latency, and headroom of the new booking system when the web and call centre sales channels were both servicing a heavy demand.
- Tuning of the system to optimise the throughput and latency characteristics, eliminating any bottlenecks or configuration issues to optimise the use of the available platform capacity.
- Checking the reliability of operation of each system when it was under stress, making sure no errors crept in.
- Ensuring the system would survive the storm, standing up for an entire day, rather than failing after a few hours as a previous attempt had done.
Making it realistic
One significant challenge was devising a realistic model of the booking demands expected on the 8th December and then to simulate the way these demands would present themselves, both directly onto the web-interface, and indirectly via the contact centre operators and onto the contact centre application. This was both an analytical and a creative endeavour.
Luckily, a comprehensive history of the nature and timing of bookings made over the previous year was available. Data for the first four days of bookings from the previous season were analysed to shape the simulated booking demand to be used for testing.
Testing in earnest
SQC built a test framework, one that was able to simulate the sequence of bookings seen the previous year. Bookings of the same complexity and in the same order, but at an accelerated rate. The load test would apply four days’ worth of bookings in an eight-hour period. Importantly, SQC also developed a process to safely configure the live call centre system, to permit out of hours load testing.
Testing began with everyone from the club expecting that everything would be fine. Once it began, all of the project reports went from green to bright red. Despite all prior testing that the development team had done and that had shown no issues, it was now clear that the system was not ready to cope when the big day arrived. That was in less than two weeks time.
An intensive period of daily problem solving was followed by nightly test runs. During test runs SQC diagnosed instability and performance issues that were investigated and subsequently turned into localised implementation issues. These were then passed onto the development teams who worked wonders, turning out fixes for these issues to allow testing of the changes overnight.
At first the load hit bottlenecks within the public facing web-tier. A trickle of the load made it through to the integration servers but the majority of the visitors to the site received errors and timeouts. Multiple iterations of web-server configuration and infrastructure changes had to be made and tested. These improvements, incrementally, increased the flow through to the integration tier.
Next, captured bookings were slow to pass through the integration tier. They queued within it, with only a small steady flow making it out the other side and into the main application. It was only after these issues were cleared that the full intensity of the load hit the back-end of the contact centre application, causing trouble there.
The end game
When the full load reached the contact centre application it too staggered under the load. Unable to keep up with the bookings from the web-tier everything slowed down. Response times for the call handlers shot up.
It was now the weekend before the 8th December, just a couple of days off go live and the annual release of slots for the next year. SQC were on site over the weekend, working with the club’s Chief Information Officer (CIO), who was dedicating his own time to the issue.
Just after midnight on the Saturday, SQC identified where the problem lay. Deep within the contact centre system’s database, the allocation of a unique primary key from a sequence was the bottleneck, one that eventually made the system grind to a halt. With no support available from the vendor the challenge became, what could be done, immediately, to address this issue and allow further testing? Waiting for the vendor was unacceptable. Everything needed to be working by the end of Sunday.
Late night surgery
Having exhausted normal database tuning techniques, a more radical solution was taken. The CIO had access to the database transaction code and so this opened the door, SQC had the idea and that led to the solution.
A 2am rewrite of the code that allocated the primary key, replaced the use of a database sequence. The safe use of a random value generation that removed the serialisation and queuing of requests, saved the day. With this patch applied, online bookings flowed smoothly without impacting call handling performance.
The annual release of slots on 8th December came and went without incident. The new solution, with the patch in place, withstood the storm and happy caravaners made lots of online bookings, for the first time.
The Caravan and Motorhome Club is the UK’s biggest touring community, helping caravaners, motorhomers and campers access over 3,000 locations in the UK and Europe.
Established in 1907, today the club represents the interests of over 1.1 million caravan, motorhome, campervan and trailer tent owners, and it prides itself on offering them great value and high quality campsites.
A successful business, with a £100+ million turnover and 1,200+ employees, the club offers over 30,000 pitches every night, or 18 million ‘sleeps’ a year, making it the biggest ‘hotel chain’ in the UK.