HITS

M&E Journal: Continuity: Things We Learned from the Fire

By Damien Carroll, Chief Operating Officer, Sohonet –

On an early morning last September, Sohonet’s NOC responded to alarms showing that all office equipment had gone offline over a phased period at the Sohonet HQ. An on-call engineer was dispatched and on arrival saw that the London Fire Brigade had five fire engines and 28 firefighters onsite and around our building on Soho Street in London. Our offices were on fire.

Many of Sohonet’s senior leadership had returned the previous evening from the IBC show in Amsterdam, or were planning to travel back that afternoon. The engineer engaged in rapid communication through the chain of command, alerting the technical and operational management including the Chief Operating Officer. Support hand-off from the Los Angeles team was a few hours in the past—but they were alerted now, too, just in case they were needed.

All Sohonet’s technical infrastructure, outside of some R&D systems, were in one of several datacenters and dozens of points of presence (POPs) around the London metro and globally, by design.

By the time the fire had been put out and anyone could enter the building, the main floor of our offices had been gutted. But Sohonet customers around the world in Europe, North America and Australia were going about their business uninterrupted.

Only our customers in the same building as us, and those whose daily commute took them past, were even aware that anything unusual had happened.

Section 1: Triage

Definition: (in medical use) the assignment of degrees of urgency to wounds or illnesses to decide the order of treatment of a large number of patients or casualties.

When our COO was notified by the on-call engineer about the fire, his first concern was for the safety of the Sohonet team and anyone else who might be in the building. Because the hour was early, it would have been simple to assume that no members of our staff had been in the office, but the on-call engineer and support team leads contacted the team anyway. All personnel who were not on holiday were contacted immediately and diverted to several locations as defined in the basic disaster recovery plan used if the office was unavailable for a short time.

An immediate operational crisis conference call was convened and, based on the information available, it was decided to enact the Level 1 disaster recovery (DR) plan.

Once this had been done, the main priority was to ensure that the support team and other core customer-facing staff had a place to work. Maintaining our service levels was a core goal of the DR plan.

The majority of the London-based support team was diverted to our offices at Pinewood Studios in Iver Heath, where some of them would be based until the situation in Soho Street could be assessed. This office is fully configured with infrastructure and technology to ensure an engineer had the same functional access to all systems as they did at the HQ Network Operations Center.

Beyond this, key engineering personnel have tools, systems and security protocols available to them to enable access to the network and most systems from any remote location with a decent internet connection.

In the meantime, regular and detailed communication was going out to the staff while the team still on the ground in Amsterdam received regular updates—today was going to be a little bit different than planned.

Step: Account for your team’s safety

Every company location should have an emergency response plan that instructs employees where to go in the event of fire, earthquake, flood, or other disaster.

Leaders should be trained and work with their groups to review exits, establish rendezvous points and determine how each group will communicate in the event it needs to leave the building. There are a number of helpful government resources available to assist businesses in creating their own plans.

Having a plan to stop people coming to the building and diverting them elsewhere is equally important.

Step: Ensure support continuity

Like most businesses, Sohonet relies on a mix of physical infrastructure, network access, systems and applications. Continuity of access to each of these elements is critical to ensuring a smooth operation.

Step: Who is needed on the scene?

Other than operations and engineering personnel needed on-site, nearly all London staff were asked to work from home until operations leadership could assess the situation and identify alternative office space. Starting on the day of the fire:

*Support team worked from our Pinewood location
*Engineering team found co-working space for hire
*Finance worked from home
*A mixed discipline group worked from a nearby coffee shop to ensure that developments on the ground could be managed and communicated.

Section 2: Internal communication

Within hours of the fire, our entire company was aware of the situation and many around the world had already begun assisting the London team.

1. Ensure that the entire team is aware of what is happening. Not everyone needs to know everything, but consistent, regular and detailed communication ensures that the team will focus on what is essential instead of worrying about what hasn’t been said.

2. Establish regular check-ins. Our COO took the lead in communicating through a number of channels, mainly Slack, SMS, phone and email, to keep key team members connected and on task. Summary emails were sent to all hands each day to ensure all of Sohonet was aware of what was happening in London.

Section 3: Customer communication

Our most obvious concern was that our customers were reassured that our situation did not present a risk to their services. We were aware that only the customers in our own building had been affected (because the power was out throughout the building) and we had spoken with their technical teams already. But the M&E community in Soho is like a small town. People had heard about the fire. Concerned customers and colleagues were calling. We needed to be proactive.

1. Make sure that affected customers are notified and given direct access to updates and information.

2. Be precise and complete in your communication with all customers; you need to set expectations for how the event will affect their service.

We knew that we needed to talk to our customers and let them know what was happening. The response was both amusing and heart-warming. Most were unaware that anything had happened and all were supportive.

Section 4: Clean-up and restart

This is a story with a happy ending, just in case you were wondering. But there was much to do to get us there.

1. Once it was clear we would not be returning to our offices any time soon, the team switched gear from short term planning, to planning for being out of the building for an extended period. Immediate work began (by midday on day one to evaluate whether there was further space available in this location to accommodate all personnel.)

2. The technical and operations teams quickly established the minimal set of requirements for getting staff back functioning to an acceptable level (switches/cables/monitors/ phones) and identifying any stock/spares that had been lost.

3. The legal and finance teams engaged with our insurers to inform and collaborate with them on the process. Having a good relationship with your insurance brokers and ensuring you have good and accurately informed insurers is a key component in minimizing financial pain during such an event.

4. Make considerations for employee morale and productivity. The old office was our home. Many people had personal items at the office or had invested time in making their workspace effective and comfortable for themselves. (Not to mention Terry the plant, the only part of the Sohonet team to have been in the office when the fire broke out.)

Section 5: Establish the new normal

On Monday, September 12, we had awoken to news of the fire. By Wednesday morning of the same week, we were in our new temporary offices, getting back to business. Over the course of the following months, we combined and recombined suites within our temporary space, until we could find a single large space where we could return to some semblance of the configuration that had worked for us in our headquarters.

This space is smaller and a bit noisier than what we’d like, and there aren’t enough meeting rooms, but the team has made do until we can return to a newly rebuilt and remodeled 5 Soho Street, which was scheduled to happen on March 20, just over six months to the day from our fire.

Key things we’ve learned:

*Great people will respond to challenging times and will pull together, bond and become even closer. Sept. 12 was a watermark. “Were you here when the fire happened?” March 20 will be another: “Were you here when we got back?”

*The systems we put in place and the focus on not having any system or tool critical to our customers’ services in our office space ensured that our customers were unaffected. This helped us a lot as we did not have that worry to deal with. Having a reliance, by design, on cloud based tools and offsite applications meant that we could get our people back to normal quickly and ensure they could work to service our customers’ business as usual in a short period of time.

*Institutionalizing the processes and procedures required for disaster recovery makes dealing with a crisis a lot more normal. We are immensely proud of the work our team did over those initial days and for their approach and attitude to each challenge we faced. No management team could have done this, it took a company.

*Have good records of what you have and what is most needed. Have a great insurance company and invest time with them to ensure they understand your business. When you need them, you want them to know who you are and to be there to support you.

Conclusion:

An event such as the fire is not something we would have wished for. But now, as we prepare to move back into our London headquarters, we are proud of our team and what we have learned.

Click here to translate this article
Click here to download the complete .PDF version of this article
Click here to download the entire Spring 2017 M&E Journal