Pyspark Developer - Lead Consultant

Chennai, TN, US

Pyspark Developer - Lead Consultant - (CREQ152063)



Skill: Pyspark Developer

Role: T1

Our Big Data Competency Center offers consulting services for big data development, tool evaluations, solution architecture and design, and end-to-end implementation services using big data ecosystems like Hadoop, NOSQL, graph databases, and appliances. Our state-of-the-art Advanced Technology Centers in the United States and India have the infrastructure to conduct R&D as well as PoCs across multiple technologies. We help measure enterprise analytics maturity, deliver an analytics roadmap, and recommend embedding analytics into existing operations to measure business outcomes. Our Advanced Analytics Competency Center offers comprehensives services in business analytics that provide decision support, trending, and forecasting.

The service offering range spans from traditional BI use cases to more complex and advanced analytical solutions like Social Analytics, Web Data Analytics, Text Analytics, Real-time BI, and Predictive Analytics (Predictive Analytics Models such as Targeting Customers by Direct Marketing, Market Basket Analysis, Case Management), Model Implementation, Advanced Data Visualization, End User Application, etc.

Key responsibility:

  • Setup, administration, monitoring, tuning, optimizing, governing Large Scale Hadoop Cluster and Hadoop components: On-Premise/Cloud to meet high availability/uptime requirements.
  • Design implement new components and various emerging technologies in Hadoop Echo System, and successful execution of various Proof-Of-Technology (PoT) / Proof-Of-Concepts (PoC).
  • Collaborate with various cross functional teams: infrastructure, network, database, application for various activities: deployment new hardware/software, environment, capacity uplift etc.
  • Work with various teams to setup new Hadoop users, security and platform governance;
  • Create and executive capacity planning strategy process for the Hadoop platform.
  • Work on cluster maintenance as well as creation and removal of nodes using tools like Ganglia, Nagios, Ambari etc.
  • Performance tuning of Hadoop clusters and various Hadoop components and routines.
  • Monitor job performances, file system/disk-space management, cluster database connectivity, log files, management of backup/security and troubleshooting various user issues.
  • Harden the cluster to support use cases and self-service in 24x7 model and apply advanced troubleshooting techniques to on critical, highly complex customer problems.
  • Contribute to the evolving Hadoop architecture of our services to meet changing requirements for scaling, reliability, performance, manageability, and price.
  • Setup monitoring and alerts for the Hadoop cluster, creation of dashboards, alerts, and weekly status report for uptime, usage, issue, etc.
  • Design, implement, test and document performance benchmarking strategy for platform as well for each use cases.
  • Act as a liaison between the Hadoop cluster administrators and the Hadoop application development team to identify and resolve issues impacting application availability, scalability, performance, and data throughput.
  • Research Hadoop user issues in a timely manner and follow up directly with the customer with recommendations and action plans.
  • Work with project team members to help propagate knowledge and efficient use of Hadoop tool suite and participate in technical communications within the team to share best practices and learn about new technologies and other ecosystem applications.
  • Automate deployment and management of Hadoop services including implementing monitoring.
  • Drive customer communication during critical events and participate/lead various operational improvement initiatives.


  • Developer must have sound knowledge in Apache Spark programming with 6+yrs of experience. Deep experience in developing data processing tasks using pySpark such as reading data from external sources, merge data, perform data enrichment and load in to target data destinations. Ability to design, build and unit test the application in Spark or Pyspark. In-depth knowledge of Hadoop, Spark, and similar frameworks.
  • Ability to understand existing ETL & logic to convert into Spark or PySpark or Spark SQL. Knowledge of Unix shell scripting, RDBMS, Hive, HDFS File System, HDFS File Types, HDFS compression codec.
  • Experience in processing large amounts of structured and unstructured data, including integrating data from multiple sources.

About Virtusa

Teamwork, quality of life, professional and personal development: values that Virtusa is proud to embody. When you join us, you join a team of 30,000+ people globally that cares about your growth -- one that seeks to provide you with exciting projects, opportunities and work with state of the art technologies throughout your career with us.

Great minds, great potential: it all comes together at Virtusa. We value collaboration and the team environment of our company, and seek to provide great minds with a dynamic place to nurture new ideas and foster excellence.

Virtusa was founded on principles of equal opportunity for all, and so does not discriminate on the basis of race, religion, color, sex, gender identity, sexual orientation, age, non-disqualifying physical or mental disability, national origin, veteran status or any other basis covered by appropriate law. All employment is decided on the basis of qualifications, merit, and business need.

Primary Location

: IN-TN-Chennai


: Full Time

Employee Status

: Individual Contributor

Job Type

: Experienced


: No

Job Posting

: 24/11/2022, 1:45:48 PM
Apply Now Back to Results Return to Job Search

Apply Now

Are you the right fit for this role? Please submit you application and resume/CV and we will be in touch.


Required, maximum file size is 512KB, allowed file types are doc, docx, pdf, odf, and txt

Currently selected file:

Apply Now

Are you the right fit for this role? Please submit you application and resume/CV and we will be in touch.
Please ensure all fields have been filled.

Your Information

Upload your Resume*

Please note only files with .pdf, .docx or .doc file extensions are accepted.

Max file weight: 512KB.

Please attach your resume, ensure it is in the correct format and smaller than 512KB.