Client Background
A healthcare provider with a monitoring/remediation SRE team that monitors 400+ clusters/daily with 5000 instances, critical infrastructure and applications around the clock
Client Need
Provide Daily Monitoring Capability and Failure Remediation with early SLAs for Business-Critical Applications and Pipelines
Plan and Execute Disaster Recovery for applications and adhere to security compliance for applications set by Cloud team
Support related to Access, Infra and Engineering
Enhancements to improve Performance and Costs
Solution
Dedicated Site Reliability Engineering (SRE) team working 24*7*365 to handle critical data pipelines and applications
Remedy Force, Moog Soft and X Matters Integrations to alert on failures with well documented SOPs with resolutions
Support related to Lower Environments – Well Constructed and Guiding Wiki Pages
Release Management and Deployment Strategy to propagate features to higher environments
Enhancements and Optimizations in Process and Infrastructure
Realized Benefits
Applications running smoothly – 24*7
Failures Remediated within a short SL
End –to- End Validated Deliveries Post Deployment and immediate rollback of faulty features
Reduced Cost – S3 Clean Ups – TBs of Data
Long Running EMRs
Application Optimizations: Run time Reduction
Intime SSL and Data Package Renewals
Early Remediation by Early Detection of Data Loss through Stats Reports
Tools & Technologies
AWS
Amazon S3 Bucket
Spark
Airflow
Moogsoft
Xmatters
GitLab
BMC Remedyforce
Trending Success Stories
Ready to Innovate with Us?
Let’s Talk!
Connect with us on social media
Write to us at
[email protected]