The solution

Healthfirst worked with Virtusa to bring in the technical skills and infrastructure necessary for building an Amazon Redshift data mart.

Our experience, coupled with our deep digital engineering capabilities, made us a vendor of choice as the client set out to automate data loading and eliminate operational bottlenecks.

With the help of Healthfirst’s ODS platform (built on Amazon S3) and Amazon Aurora PostgreSQL, we ingested data into an Amazon Redshift data mart using PySpark ETL. Some of the key aspects of the solution delivered are:

Rearchitected the data integration layer with a parameterized, configurable open-source data transformation framework that manages end-to-end functionality
Developed the framework using Sqoop, PySpark, AWS Glue, AWS Glue catalog, and Amazon S3
Automated error detection process leading to faster root cause identification
Ongoing IT support for incident management with resolution SLAs
Content-Defined Chunking (CDC) identification based on Sha2-256 hashing technique and ability to handle type1, type2, and hybrid Slowly Changing Dimension (SCD) types.
Audit component built with attributes captured to the maximum details
Restoration from any failure point
Logging enabled by default for every process

AWS Services Used:

Amazon Redshift
Amazon Aurora
Amazon S3
Amazon EMR
AWS Glue

Loading component...

Loading component...

Loading component...

Services

Industries

Insights

Careers

About

Services

Industries

Insights

Careers

About