How does your organization's cloud-based technology stack up? Take the Future Fitness Assessment: Cloud Adoption to find out. 

success story

Virtusa helps major U.S. bank restructure their data quality checking process with simplified automation

The Challenge

Our client, a large U.S. Bank, also a top enterprise for Wealth Management and Custody Services, had a legacy data-warehouse environment with poor data quality. As Virtusa embarked upon an initiative to create a data warehouse on Snowflake to replace the old with a new data model, creating and running data quality checks in the new environment became a challenge. 

The Solution

Virtusa developed and implemented unit and end-to-end data quality testing for Snowflake using Virtusa Data Quality Checks and Great Expectations. Test Cases were jointly developed by the Bank’s IT and Virtusa teams, Virtusa created Test Suites, and executions were verified by the Bank. The tests were automated and orchestrated using Apache Airflow, which visually defines schedules and job dependencies in a directed graph.

The streamlined data processes included: 

  • Creating over a thousand white-box unit tests to ensure data movement (from on-premises to raw) and subsequent pipeline stages, including complex transformations, were accurate and complete.
  • Simplifying legacy implementation for data quality checks
  • Automation for testing the entire suites on a nightly basis, including specific files or subject areas that may have changed during development.   
  • Customizing testing parameters based on environmental variables 
Data Quality Checking Process With Simplified Automation
The Outcome

With Virtusa’s help, the client was able to conduct all necessary data checks and increase productivity levels while preserving resources. Using the test-first methodology required by the client, we built a 3-member tester team to partner with the Bank’s IT team and define Test Cases. We then worked with a 15-person Development team from Virtusa to implement all Unit Tests. As the pipeline progressed, the team also implemented end-to-end tests on the consumer zone in Snowflake to check statuses such as account balances. 

As a result, the client was able to: 

  • Successfully utilize Virtusa’s warehouse ingestion and data pipeline development across many data conditions and variations.
  • Achieve testing and scheduling automation using both command line & Airflow.
  • Establish clear, consistent, and accurate data quality – with no licensing costs.
Data Quality Checks (DQC) Framework solution

Learn how Virtusa’s Data Quality Checks Framework solution can transform and automate the way your company conducts data quality checks. 

Related content