success story

Clarivate modernizes its data management service with Amazon EMR

As a global leader in providing information and insights, Clarivate delivers critical data, information, workflow solutions, and deep domain expertise to accelerate the pace of innovation. Actionable insights derived from big data are an integral part of the company’s regular business operations.

With the big data traffic growing exponentially, the company’s previous setup struggled to cope with sudden changes, and scaling had become expensive. With a goal to deliver simplicity and efficiency to create customer delight, Clarivate chose Amazon Elastic MapReduce (Amazon EMR) and Virtusa as its partner to help scale services on demand while securing data.

The Challenge

Clarivate needed to scale out rapidly to meet changing demands. Clarivate also needed a solution that would help them control costs and build efficiencies, leading to improved scaling capabilities and reduce dependencies on multiples vendors for data and analytics.


Amazon EMR is well known for its ability to analyze vast data sets quickly and enhance data management efficiency. Virtusa helped replace the company’s current system with Amazon EMR as Clarivate was already leveraging other AWS services, and having Amazon EMR would help to maximize ROI fully.

Salient features of the deployment include:

  • Virtusa proposed Clarivate to migrate to Amazon EMR since many other AWS services, such as Amazon EC2 and Amazon S3, were already being used for other Clarivate projects. 
  • The Amazon EMR workflow included automatic cluster launching and termination through scripts in the batch processing workflow. It also included an auto-scaling feature based on the container ratio and HDFS space availability and memory-intensive node types.
  • Some of the big data jobs used HDFS for intermediate data storage; used Amazon RDS for Oracle for DB procedure, the final output data was put in Amazon S3 buckets, and Amazon S3 bucket life cycle rules were applied.
  • Automated management and deployment of clusters for the customer through AWS CLI with shell scripting; bootstrapping was used to copy certain application-specific files into all data nodes.

AWS Services Used:

  • Amazon EMR
  • Amazon EC2
  • Amazon S3
  • Amazon RDS


Data management service with Amazon EMR

With Amazon EMR, Clarivate has access to the scale required to find meaningful connections with data and deliver actionable insights to its customers. Harnessing the synergy within the AWS ecosystem, the company no longer has to plan for downtimes to deploy enhancements or features, especially given that Amazon EMR is a managed service.

Amazon EMR gave Clarivate some key benefits such as:

  • Significantly optimized the use of cloud infrastructure
  • Enhanced security by separating accounts under AWS for development and production, role-based access, and security groups
  • Improved platform stability
Related content