Monitoring Support Services for Provider of HR Solutions in Cloud
The company is a provider of HR solutions in the Cloud using the SaaS model. It is the subsidiary of a leading global multinational corporation specializing in enterprise software for managing business operations and customer relations.
The client has an extensive infrastructure with over 16,000 servers. They required the services of Veritis for system administration, implementation of monitoring tools and troubleshooting any technical issues.
The key requirements included stabilizing and fixing the missing data in the Splunk environment which is crucial for troubleshooting customer issues, in addition to the implementation of Hadoop.
Veritis has been associated with the client for around 18 months now. Our approach to the project included an onshore-offshore model with experts and system admins working in coordination from both the US and India.
As part of the Monitoring Support Services group at the client location, our resources focused on managing the large servers which contain the logs that provide useful insights into the customer data. The team utilized seven different monitoring and log analysis tools including Splunk, Zabbix, Moogsoft, Dynatrace and SolarWinds, among others in their activities.
In the initial stage, the servers sent the logs to Splunk which were used for building the dashboards and creating the alerts to notify the operations team and the customer and resolve any issues as soon as possible.
Zabbix was used for monitoring the health of servers, CPU utilization and the overall server status. Zabbix templates were created based on customer request.
Monitoring of the 10 Data Centers was done using SolarWinds. The parameters monitored included CPU status, network latency and status of the virtual machines, among others.
Dynatrace, a synthetic monitoring tool which logs in to the website and replicates the same functions as performed at the customer-end, was also used in understanding the issues.
All data was then sent to Moogsoft, an AI log analytics tool hosted on the Cloud by a 3rd party, which utilized machine learning algorithms to correlate the events for better understanding of customer issues.
Alerts received from any of the tools were integrated to a ticketing tool called ServiceNow, through which tickets are created automatically and respective teams are notified for taking the necessary action.
We recently implemented machine learning to predict the downtime of the website and any latency issues. Additionally, our team is currently working on providing a user interface to all the teams for logging in and searching in any of the seven tools.
The data centers are running on Microsoft Azure with the hybrid model of a few servers running on Azure and others on the private cloud. Currently, we ensure that our data centers are running at 99.99% with a target to achieve 99.9999999% efficiency and negligible downtime soon.
Key Benefit to Client
The Splunk dashboards built by Veritis enabled the client to provide much better service to their end customers.
Since the data centers are in multiple locations, designing a centralized deployment model for tools configurations became a challenge. We overcame it by setting up ansible configuration management tool in one DC and opened network connections to other DC’s.
Splunk, Zabbix, Moogsoft, Dynatrace, SolarWinds, ServiceNow