Skip to main content

Monitoring Support Services for Provider of HR Solutions in Cloud

Monitoring Support Services for Provider of HR Solutions in Cloud

Client Profile

The company provides HR solutions in the Cloud using the SaaS model. It is a subsidiary of a leading global multinational corporation that specializes in enterprise software for managing business operations and customer relations.

Client Requirements

The client has an extensive infrastructure with over 16,000 servers. They required the services of Veritis for system administration, implementation of monitoring tools, and troubleshooting of any technical issues.

The key requirements included stabilizing and fixing the missing data in the Splunk environment, which is crucial for troubleshooting customer issues, and implementing Hadoop.

Challenges

1) Missing Data in Splunk Environment

One of the primary challenges was stabilizing and addressing missing data in the Splunk environment, which was critical for troubleshooting customer issues. Missing or incomplete logs hindered the client’s ability to monitor and resolve system problems effectively, creating potential delays in service delivery.

2) Centralized Tool Configuration Across Multiple Data Centers

The client’s infrastructure spanned multiple data centers, making implementing a centralized deployment model for monitoring tools complex. Configurations had to be synchronized across several locations, which posed challenges in ensuring uniformity and efficiency in system administration and monitoring.

3) Complex Monitoring of 16,000 Servers

With over 16,000 servers across 10 data centers, monitoring performance metrics such as CPU utilization, network latency, and server health was a significant challenge. The large-scale operations required advanced tools and methodologies to ensure all servers were consistently monitored with minimal performance degradation.

4) Predicting Downtime and Latency

Anticipating downtime and latency issues was a critical challenge, as these could significantly affect customer experience. Traditional monitoring approaches were insufficient to proactively detect potential service interruptions, requiring the implementation of machine learning techniques for more accurate forecasting.

Solutions

1) Splunk Data Stabilization and Dashboard Development

Veritis stabilized the Splunk environment by fixing the missing data issues and ensuring the server logs were consistently captured. By developing comprehensive Splunk dashboards, the team provided real-time insights into customer data, enhancing the client’s capacity to diagnose and resolve issues promptly and effectively.

2) Centralized Configuration Management with Ansible

To overcome the challenge of managing configurations across multiple data centers, Veritis implemented Ansible as a configuration management tool. This allowed for centralized deployment and configuration of monitoring tools, with network connections established across data centers to ensure synchronization and efficiency.

3) Comprehensive Monitoring with Multiple Tools

The Veritis team used a combination of tools such as Zabbix, SolarWinds, Dynatrace, and Moogsoft to monitor the health and performance of the client’s 16,000 servers. Zabbix tracked CPU utilization and server health, while SolarWinds monitored the data centers’ network latency and virtual machines. Dynatrace replicated customer experiences, and Moogsoft employed machine learning to correlate events for deeper insights.

4) Machine Learning for Predictive Monitoring

Veritis introduced machine learning techniques into the monitoring process to address downtime and latency issues. These algorithms were designed to predict potential downtime and latency, allowing the client to take proactive measures before issues affected the end users. This predictive capability was further enhanced by providing an intuitive user interface for all teams to search and monitor data across the seven tools.

Veritis Approach

Veritis has been associated with the client for around 18 months now. Our approach to the project included an onshore-offshore model with experts and system admins working in coordination from both the US and India.

As part of the Monitoring Support Services group at the client location, our resources focused on managing the large servers that contain the logs that provide useful insights into the customer data. The team utilized seven different monitoring and log analysis tools including Splunk, Zabbix, Moogsoft, Dynatrace and SolarWinds, among others in their activities.

In the initial stage, the servers sent the logs to Splunk which were used for building the dashboards and creating the alerts to notify the operations team and the customer and resolve any issues as soon as possible.

Zabbix was used for monitoring the health of servers, CPU utilization and the overall server status. Zabbix templates were created based on customer request.

Monitoring of the 10 Data Centers was done using SolarWinds. The parameters monitored included CPU status, network latency and status of the virtual machines, among others.

Dynatrace, a synthetic monitoring tool which logs in to the website and replicates the same functions as performed at the customer-end, was also used in understanding the issues.

All data was then sent to Moogsoft, an AI log analytics tool hosted on the Cloud by a 3rd party, which utilized machine learning algorithms to correlate the events for better understanding of customer issues.

Alerts received from any of the tools were integrated to a ticketing tool called ServiceNow, through which tickets are created automatically and respective teams are notified for taking the necessary action.

We recently implemented machine learning to predict the downtime of the website and any latency issues. Additionally, our team is currently working on providing a user interface to all the teams for logging in and searching in any of the seven tools.

The data centers are running on Microsoft Azure with the hybrid model of a few servers running on Azure and others on the private cloud. Currently, we ensure that our data centers are running at 99.99% with a target to achieve 99.9999999% efficiency and negligible downtime soon.

Key Benefit to Client

The Splunk dashboards built by Veritis enabled the client to provide much better service to their end customers.

Tools

Splunk, Zabbix, Moogsoft, Dynatrace, SolarWinds, ServiceNow

Contact Us

Be the first to write a comment.

Leave a Reply

Required fields are marked *

Discover The Power of Real Partnership

Ready to take your business to the next level?

Schedule a free consultation with our team to discover how we can help!