Data centre monitoring is essential to ensure quality and continuity of service but it stops being useful if technology support teams are overwhelmed by irrelevant alerts. Here are four ways organisations can cut through all the data noise when monitoring data centres.
1. Monitoring needs to be concise, precise and relevant
Monitoring tools need to be concise, precise and help lead technology professionals directly to the causes of outages or other issues, regardless of whether those issues lead back to servers, networks or any other item of data centre infrastructure.
If a monitoring tool is providing a large number of irrelevant alerts, the vast majority of technical support staff will ignore them. This leads to apathy and they stop taking even the urgent alerts seriously and those that really matter get hidden amongst the mountain of irrelevant information. These teams are already overloaded with work and stress and the multitude of data noise is only adding to the pressure they are already under.
In addition, data centre monitoring must be able to provide the precise data analysts need, when they need it, particularly about the urgent issues and to only sound the alarm over genuine problems or threats and not insignificant issues.
It's also important to be able to consolidate multiple alerts about the same issue into one concise alert, for instance where a device or port might be up, then down, then up again, so analysts are not overwhelmed by multiple warnings regarding the same problem.
2. Use the data reporting to enable better service quality
Effective data reporting will allow IT professionals to analyse trends in their data centres and help them to decide where more capacity is needed. By doing so, they will be more proactive, by detecting problems before they affect customers, avoiding any downtime and helping to fulfil service level agreements (SLAs).
Being able to highlight looming issues via this type of trends analysis helps organisations to provide a better quality experience to their data centre users and customers. Aligning the performance data with business metrics helps them to identify what really matters and allows them to make informed investment decisions based on the potential business impact.
3. Data centre monitoring has to provide fully automated 360-degree visibility
Managing data centre infrastructure is a challenging task because they are often extremely complex and can feature a hybrid architecture with multiple data centres and cloud systems, each one on its own as well as the data paths and connections between all of them.
Data centres are also very dynamic, so they are subject to minute-by-minute change. The dynamic nature of data centres is due to equipment continually being added or removed, which means that hardware then has to be reconfigured.
There will often be reconfigurations of interconnections because all the devices are interconnected and those connections can change. For instance, servers might be changed from one switch to another. You may also have end-users connected on the access layer of the network and those end users may move around.
What this means is that data centre monitoring tools must be equally dynamic; able to map all assets, first of all, but equally able to track changes as and when they occur, to identify genuine anomalies.
In order to approach this ongoing data centre complexity, organisations should start to think about the role that automation might play to cut through data noise and identify and fix even the smallest technology issues before they affect users.
4. Understand data centre traffic patterns to avoid bottlenecks
Understanding patterns of traffic, hour by hour and week by week, allows a dynamic threshold to be generated for a typical hour's, day's or week's traffic across the data centre infrastructure. This will enable significant deviations to be automatically highlighted. It also takes into account the anticipated deviations that would normally be expected during a normal working day.
An auto-tuning feature is based on data that can also be manually queried to determine the causes of unusual or unexpected events. Having that information easily available at the fingertips of a data centre professional would highlight a routing issue, which can then be fixed, saving the costs of a bandwidth upgrade.
The take out
The vast amount of data noise bombarding IT teams has been exacerbated by the rapid acceleration of cloud technology implementation over the past two years due to the pivot to remote working during the pandemic.
Organisations really need to make sense of the data noise to avoid flying into adverse operational conditions caused by their data centre. Organisations in highly-regulated industries such as finance and healthcare need to make periodic data centre risk assessments and disaster testing a part of their routine operations.
Risk mitigation with IT infrastructure is a shared responsibility, not just the CIO's or CTO's. Organisations need to have an appropriate number of IT staff trained and willing to do what it takes to stay on top of data centre operations and make sense of the data that is provided to them by their monitoring system.