Job Description
Data Engineer (Spark/Scala/Python) - Onsite and OffsiteWork
Location: Remote OpportunityBudget: AED 10,000 (Maximum)Notice Period: Early joiners preferred (Maximum 30 days)Budget Onsite: AED 19,000 (Maximum)Notice
Period: Early joiners preferred (Maximum 30 days)Job Description:We are looking for a highly skilled and motivated Spark Data
Engineer to join our team. The ideal candidate will have a strong background in
Apache Spark, data ingestion, data processing, and data integration, and will
be responsible for developing and maintaining our dynamic data ingestion
framework using the Spark framework. The candidate should have expertise in
building scalable, high-performance, and fault-tolerant data processing
pipelines using Spark, and be able to optimize Spark jobs for performance and
scalability. The candidate should also have experience in designing and implementing
data models, handling data errors, implementing data quality and validation
processes, and integrating Spark applications with other big data technologies
in the Hadoop ecosystem. Responsibilities:•
Develop and maintain a dynamic data ingestion
framework using Apache Spark•
Implement data ingestion pipelines for batch
processing and real-time streaming using Spark's data ingestion APIs•
Design and implement data models using Spark's
DataFrame and Dataset APIs•
Optimize Spark jobs for performance and scalability,
including caching, broadcasting, and data partitioning techniques•
Implement error handling and fault tolerance
mechanisms to handle data errors, processing failures, and system failures in
Spark applications•
Implement data quality and validation processes,
including data profiling, data cleansing, and data validation rules using
Spark's data processing and data validation APIs•
Integrate Spark applications with other big data
technologies in the Hadoop ecosystem, such as Hadoop, Hive, HBase, Kafka, and
others•
Ensure data security by implementing data
encryption, data masking, and data access controls in Spark applications•
Use version control systems, such as Git, for
source code management, and implement DevOps practices, such as continuous
integration, continuous delivery, and automated deployments, in Spark
application development workflows Qualifications:•
Bachelor's or Master's degree in Computer
Science, Data Engineering, or a related field•
Strong proficiency in Apache Spark, including
Spark Core, Spark SQL, Spark Streaming, and Spark MLlib, with multiple
production developments and deployment experience.•
Proficiency in either Scala or Python
programming languages, with knowledge of functional programming concepts•
Experience in developing and maintaining dynamic
data ingestion frameworks using Spark•
Experience in data processing, data integration,
and data modeling using Spark's DataFrame and Dataset APIs•
Knowledge of performance optimization techniques
in Spark, including caching, broadcasting, and data partitioning•
Experience in implementing error handling and
fault tolerance mechanisms in Spark applications•
Knowledge of data quality and validation
techniques using Spark's data processing and data validation APIs•
Familiarity with other big data technologies in
the Hadoop ecosystem, such as Hadoop, Hive, HBase, Kafka, etc.•
Experience in implementing data security
measures in Spark applications, such as data encryption, data masking, and data
access controls•
Strong problem-solving skills and ability to
troubleshoot and resolve issues related to Spark applications•
Proficiency in using version control systems, such
as Git, and implementing DevOps practices in Spark application development
workflows
•
Excellent communication and collaboration
skills, with the ability to work effectively in a team-oriented environment