Client Background
Client is an American media measurement and analytics company providing marketing data and analytics to enterprises, media and advertising agencies, and publishers.
Client Need
Needed consultation for evaluation of tools and approaches for cloud adaptation. The objective was to offload computing from existing out-moded on-premise MapR cluster to the cloud.
Needed a solution custom-built for their live data (largest module) for evaluation and decision-making.
Needed an automated solution for resource configuration, deployment, scheduling, scalability, etc.
Needed the ability to process incoming incremental data (10 TB or more) in a better and more efficient manner.
Solution
Provided a cloud-optimized, on-demand spin up solution for the computation offloading and Snowflake-based reporting solution.
Weekly extraction of 5TB or more data performed from the on premise MapR cluster and placed in S3 using shell script & AWS CLI executed by Airflow jobs.
Based on data size, copied over AWS EMR cluster is spun up using cloud formation templates and AWS CLI for executing Spark & Pig scripts.
Resultant data post-processing from EMR is pushed into S3 buckets for persistence.
AWS EMR cluster is auto-scaling enabled and gets purged post-processing.
Realized Benefits
Provided a cost-efficient – On-demand solution for computation on AWS platform
Added value by providing best-suited recommendations for resource type and configuration for a cost-efficient and optimal solution.
Offloaded jobs that would need 48 hours in on perm server to cloud and processed them within 24 hours.
Tools & Technologies
Amazon S3
Apache Pig
Apache Spark
Cloud Formation
Amazon EMR
MAPR
Apache Airflow
Python
R
Powershell
Snowflake
Bash
Trending Success Stories
Ready to Innovate with Us?
Let’s Talk!
Connect with us on social media
Write to us at
[email protected]