Data Integration

We design and build foundational data platforms to support enterprise-wide digital transformation.

  • Data platforms in RDBMS(Oracle, SQL Server, DB2, etc.)
  • DWH Appliances(Netezza, SAP HANA, Teradata)
  • Cloud-based DW(AWS, HP Vertica, etc.)
  • NO SQL platforms(including Hadoop, neo4j, MongoDB, etc.)

We have extensive experience in Extraction, Transformation and Loading (ETL) of data from a wide variety of sources including legacy applications, ERP systems, CRM and other web content, Standard relational databases, HDFS, unstructured data including social media, blogs, machine data, NoSQL Databases, MongoDB, Cassandra, Hbase, on-premise cloud-based applications, Files (e.g. XML, Excel, CSV, flat files) and web service APIs.

Our specialized enterprise information integration skills in cutting-edge ETL tools including DataStage, Oracle Data Integrator, MS-SSIS, Informatica, Talend, AbInitio and the like that support delivery of scalable and reliable data integration solutions to our clients across the globe. Virtusa's data integration service capabilities acquire structured and unstructured data from virtually any source, integrate and deliver quality data in a high-performance environment.

Our success stories:

  • Cloud-based analytical data platform for a leading price comparison provider:Developed a next generation cloud-based analytical data platform through new enterprise data models- consolidated, conformed and consistent.
  • Agile data lake to monetize data effectively for a leading healthcare analytics provider:Developed a Hadoop based agile data lake, streamlined data ingestion, distributed data storage ‚data lake (HDFS) and standardized and provided scalable data processing.
  • Hadoop data platform for a leading media and information firm:Developed a big data based Integrated Metrics System (IMS) to deliver research metrics and significantly reduce data processing time.
  • Appliance Migration ‚ Oracle to Netezza for a leading insurance company:Provided a data store consolidation and migration strategy from the Oracle to Netezza Platform.
  • Customer data store implementation for a leading telco:Consolidated customer data from 50+ diverse legacy systems to create a 360-degree view of the customer profile.
Data Integration Center of Excellence

Our Data Integration CoE offers a comprehensive range of services to provide our clients the information they need in the appropriate format, so that their time is effectively spent on making accurate and timely business decisions.

Our Data Integration services are based on industry best practices, methodologies, rich domain expertise and experience across similar engagements.

Our Data Integration solutions are delivered by a dedicated pool of data integration specialists, domain experts, business analysts, technical architects, backed by our time-tested CoE processes.

  • Over 500 Data Integration service professionals in our CoE provide high-end design, architecture and technical leadership

  • Proven solution accelerators and frameworks increase productivity and decrease total cost for EIM implementations and improve time to market

  • Reusable components that accelerate project delivery

  • Substantial cost reduction realized through our global delivery dodel

  • Right size, high-touch, client centric relationship management, providing superior customer experience

Tools and Accelerators

Domain-focused data model accelerators, developed to the logical data model level and provision to add data elements that may be required later

  • Data model accelerators: Domain-focused data model accelerators, developed to the logical data model level and provision to add data elements that may be required later
  • ETL accelerators:
    • Top N time consuming job monitor: Automated mechanism to identify resource consuming ETL jobs to initiate performance tuning efforts
    • Job compare tool: Automates tracking of version changes for ETL jobs, automated mechanism to compare version of a job which enables quick rollout / rollback of jobs resulting from a change request
    • Impact analysis tool: Development aid for impact analysis, search utilities, parameter listing and job complexity
    • ETL test automation tool: Automated way to compare and perform metadata validation, automated verification and validation process ensuring 100% accuracy in less time
    • ETL jobs / load statistics tool: Automates the monitoring of DataStage jobs through UNIX scripting solution to collect the job statistics once daily load gets completed
    • DB rejects capture tool: Captures the records that were missed during loading because of database constraints, identifies incomplete data load at the table / row level and provides summary of why there is a data reconciliation mismatch
    • Attributes reconciliation: Provides an automated on-demand flash of actual attribute count across projects, rule engine based workflow with configurable counting algorithms that saves reconciliation time
  • ETL code review tool:
    • Enables standardization which ensures code quality and effective job implementation
    • Scalable and configurable solution that can be extended across projects and scenarios
    • Batch Processing enables to review multiple Talend objects at one shot
    • Provides accurate error details on the failed objects in the code review report
    • Productivity increase by 8 fold
    • Ability to review changes that impact multiple jobs
  • Data validation test tool:
    • Data validation includes validating that all records, all fields and complete data for each field is loaded, validates source and target data for counts and completeness
    • Automated way to compare and perform metadata validation
  • Delivery assurance tools, templates:
    • EIM Design review checklist
    • INFA, Oracle, Teradata, Talend, Ab Initio, DataStage – development standards and best practices
    • Development checklist and deployment Checklist
  • ETL migration framework
    • Common ETL migration framework that can be extended across tools: SSIS to Informatica and Ab Initio to Talend, etc.
Related content