5 Steps to Better Business Service Quality
Is your IT team struggling to keep the lights on – literally? You’re not alone. Many IT organizations are plagued by business service outages. The result? They’re in constant firefighting mode. When a critical business service – such as an e-commerce portal, warehouse management system or call center – goes down, every minute counts. The cost of business service outages can be crippling – hundreds of thousands of dollars or more every hour in some cases – so the pressure to get services back up and running is enormous.
There has to be a better way. What if you could actually prevent service outages and fix them more quickly and easily when they happen? Well, you can. Here are five things you can do to deliver better business service quality.
1. Map your business services
We’ve all been there. A business service goes down, and the next thing you know you’ve got people pointing fingers at each other in the war room. It’s the load balancer. No, the application isn’t configured correctly. Wait, perhaps we’ve got an intermittent problem on the VLAN. And so on. No one really understands what’s going on, everyone’s an expert and you’re wasting valuable time.
Here’s why this happens. IT departments often don’t really understand how their infrastructure is connected – and how it delivers business services. Sure, they may know that a particular database runs on a particular server, but they don’t know how things are connected end to end. If a business service is down, does it even use that failed communications link or web server? There are just too many red herrings – and meanwhile, the clock is ticking.
Here’s the key. Map your business services up front. Don’t wait until the alarms go off. It’s an investment of time and effort, but you’ll get an enormous payback. Instead of pointing fingers, you’ll get to the root cause – which means that you spend precious time fixing the issue, rather than trying to find it.
2. Make change management a top priority
What’s the most common reason for business service outages? It’s not software bugs, equipment failures or overloaded servers. Change is the culprit. When things change, they can break – and often do. Whether it’s misconfiguring a load balancer, upgrading software or something else, every change can have unintended and disastrous consequences.
That’s why having a strict change management process is so important. There’s no use closing the barn door after the horses have escaped – you need to review each change up front, and make sure that it’s not going to affect your business services. If you don’t have a CAB – change advisory board – then make setting one up a top priority.
Also, this is another reason why mapping your business services is so important. Unless you understand how changes are going to affect your business services, you’re flying blind. No amount of process is going to save you. Unless you know which business services depend on that storage array you’re about to upgrade, you’ve no way of telling whether you are going to cause a service outage. Knowing how your infrastructure is connected also helps you to identify dependencies between IT components – for instance, knowing that you need to make cascading changes to maintain version compatibility when rolling out a software upgrade.
3. Filter out the noise
Noise is one of the biggest problems that IT operations faces. A single issue can generate thousands of events, and correlating these events is a huge task – leading to delays and extended service outages. There’s also a constant background hum of events, which often make it difficult to even know that you have an issue.
Traditional event management systems come with filtering engines, but you have to define all the filtering rules yourself. Building these rules is a long, painstaking process – and you’ll only ever cover a small percentage of failure scenarios.
On the other hand, there are now advanced technologies that use automated intelligence to correlate events. These technologies can often identify the root cause of business service issues in 30 seconds or less, compressing up to 100,000 events into a single actionable incident. This type of technology can also identify potential business service issues, so that you can take corrective action before users are affected.
4. Automate
For most IT organizations, resolving incidents is an enormously manual process. Whether it’s going out and collecting log files, reconfiguring equipment, or patching software, most of these activities are still usually done by hand. Again, this is a huge waste – both of time and effort. Instead of subject matter experts spending their valuable time actually fixing business service issues, they end up working on mundane activities that can easily be automated. Again, this means it takes longer to resolve service outages – and resources that could be doing high-value work instead spend all of their time keeping the lights on.
That’s why it makes sense to invest in orchestration. With an orchestration engine, you can automate a significant portion of your remediation activities so that you resolve business service issues more quickly and accurately. Ideally, look for an orchestration engine that integrates tightly with your event management and incident management platform so that you can automatically trigger appropriate activities – such as log file collection – when an incident occurs.
5. Drive continuous improvement
Problem management is perhaps one of the most misunderstood ITIL processes. Search the web, and you can see statements such as “an incident is typically caused by one or more problems” – perfectly valid, but not particularly helpful. Here’s what problem management is about – driving continuous improvement in service quality. It’s not just about finding the root cause of multiple similar incidents. It’s about preventing these types of incidents from happening again.
By putting in place a structured problem management process, you can proactively identify and mitigate systemic issues that affect service quality – whether these are problems with a specific hardware vendor or software release, the failure of a cloud provider to meet SLAs, or a plethora of other possibilities. You’ll not only improve service quality, you’ll reduce your IT operations costs – and that’s a real win-win for both your IT team and your business.
To learn more about how the Optanix platform can help you maximize digital value for your enterprise, contact us today!