Apache Flume Jobs in Chennai

11+ Apache Flume Jobs in Chennai | Apache Flume Job openings in Chennai

Apply to 11+ Apache Flume Jobs in Chennai on CutShort.io. Explore the latest Apache Flume Job opportunities across top companies like Google, Amazon & Adobe.

Big Data Developer

at GeakMinds Technologies Pvt Ltd

3 recruiters

Posted by John Richardson

Chennai

1 - 5 yrs

₹1L - ₹6L / yr

Hadoop

Big Data

HDFS

Apache Sqoop

Apache Flume

+2 more

• Looking for Big Data Engineer with 3+ years of experience. • Hands-on experience with MapReduce-based platforms, like Pig, Spark, Shark. • Hands-on experience with data pipeline tools like Kafka, Storm, Spark Streaming. • Store and query data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto. • Hands-on experience in managing Big Data on a cluster with HDFS and MapReduce. • Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm. • Experience with Azure cloud, Cognitive Services, Databricks is preferred.

Software developer

Tier 1 MNC

Agency job

via People First Consultants by Jayaraj E

Chennai, Pune, Bengaluru (Bangalore), Noida, Gurugram, Kochi (Cochin), Coimbatore, Hyderabad, Mumbai, Navi Mumbai

3 - 12 yrs

₹3L - ₹15L / yr

Spark

Hadoop

Big Data

Data engineering

PySpark

+1 more

Greetings,
We are hiring for Tier 1 MNC for the software developer with good knowledge in Spark,Hadoop and Scala

Data Engineer

at Ganit Business Solutions

3 recruiters

Posted by Viswanath Subramanian

Chennai, Bengaluru (Bangalore), Mumbai

4 - 6 yrs

₹7L - ₹15L / yr

SQL

Amazon Web Services (AWS)

Data Warehouse (DWH)

Informatica

ETL

+1 more

Responsibilities:

Must be able to write quality code and build secure, highly available systems.
Assemble large, complex datasets that meet functional / non-functional business requirements.
Identify, design, and implement internal process improvements: automating manual processes, optimizing datadelivery, re-designing infrastructure for greater scalability, etc with the guidance.
Create datatools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
Monitoring performance and advising any necessary infrastructure changes.
Defining dataretention policies.
Implementing the ETL process and optimal data pipeline architecture
Build analytics tools that utilize the datapipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics.
Create design documents that describe the functionality, capacity, architecture, and process.
Develop, test, and implement datasolutions based on finalized design documents.
Work with dataand analytics experts to strive for greater functionality in our data
Proactively identify potential production issues and recommend and implement solutions

Skillsets:

Good understanding of optimal extraction, transformation, and loading of datafrom a wide variety of data sources using SQL and AWS ‘big data’ technologies.
Proficient understanding of distributed computing principles
Experience in working with batch processing/ real-time systems using various open-source technologies like NoSQL, Spark, Pig, Hive, Apache Airflow.
Implemented complex projects dealing with the considerable datasize (PB).
Optimization techniques (performance, scalability, monitoring, etc.)
Experience with integration of datafrom multiple data sources
Experience with NoSQL databases, such as HBase, Cassandra, MongoDB, etc.,
Knowledge of various ETL techniques and frameworks, such as Flume
Experience with various messaging systems, such as Kafka or RabbitMQ
Good understanding of Lambda Architecture, along with its advantages and drawbacks
Creation of DAGs for dataengineering
Expert at Python /Scala programming, especially for dataengineering/ ETL purposes

Responsibilities:

Must be able to write quality code and build secure, highly available systems.
Assemble large, complex datasets that meet functional / non-functional business requirements.
Identify, design, and implement internal process improvements: automating manual processes, optimizing datadelivery, re-designing infrastructure for greater scalability, etc with the guidance.
Create datatools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
Monitoring performance and advising any necessary infrastructure changes.
Defining dataretention policies.
Implementing the ETL process and optimal data pipeline architecture
Build analytics tools that utilize the datapipeline to provide actionable insights into customer acquisition, operational efficiency, and other key business performance metrics.
Create design documents that describe the functionality, capacity, architecture, and process.
Develop, test, and implement datasolutions based on finalized design documents.
Work with dataand analytics experts to strive for greater functionality in our data
Proactively identify potential production issues and recommend and implement solutions

Skillsets:

Good understanding of optimal extraction, transformation, and loading of datafrom a wide variety of data sources using SQL and AWS ‘big data’ technologies.
Proficient understanding of distributed computing principles
Experience in working with batch processing/ real-time systems using various open-source technologies like NoSQL, Spark, Pig, Hive, Apache Airflow.
Implemented complex projects dealing with the considerable datasize (PB).
Optimization techniques (performance, scalability, monitoring, etc.)
Experience with integration of datafrom multiple data sources
Experience with NoSQL databases, such as HBase, Cassandra, MongoDB, etc.,
Knowledge of various ETL techniques and frameworks, such as Flume
Experience with various messaging systems, such as Kafka or RabbitMQ
Good understanding of Lambda Architecture, along with its advantages and drawbacks
Creation of DAGs for dataengineering
Expert at Python /Scala programming, especially for dataengineering/ ETL purposes

Big data Cloud

at Altimetrik

8 recruiters

Agency job

via SOT-Science of talent Acquisition consulting services Pvt Ltd by Mahesh Kumar

Chennai, Hyderabad

5 - 10 yrs

₹10L - ₹25L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+2 more

Bigdata with cloud:

Experience : 5-10 years

Location : Hyderabad/Chennai

Notice period : 15-20 days Max

1. Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight

2. Experience in developing lambda functions with AWS Lambda

3. Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark

4. Should be able to code in Python and Scala.

5. Snowflake experience will be a plus

Bigdata with cloud:

Experience : 5-10 years

Location : Hyderabad/Chennai

Notice period : 15-20 days Max

1. Expertise in building AWS Data Engineering pipelines with AWS Glue -> Athena -> Quick sight

2. Experience in developing lambda functions with AWS Lambda

3. Expertise with Spark/PySpark – Candidate should be hands on with PySpark code and should be able to do transformations with Spark

4. Should be able to code in Python and Scala.

5. Snowflake experience will be a plus

Python + Data scientist

A leading global information technology and business process

Agency job

via Jobdost by Mamatha A

Chennai

5 - 14 yrs

₹13L - ₹21L / yr

Python

Java

PySpark

Javascript

Hadoop

Python + Data scientist :
• Hands-on and sound knowledge of Python, Pyspark, Java script

• Build data-driven models to understand the characteristics of engineering systems

• Train, tune, validate, and monitor predictive models

• Sound knowledge on Statistics

• Experience in developing data processing tasks using PySpark such as reading,

merging, enrichment, loading of data from external systems to target data destinations

• Working knowledge on Big Data or/and Hadoop environments

• Experience creating CI/CD Pipelines using Jenkins or like tools

• Practiced in eXtreme Programming (XP) disciplines

Python + Data scientist :
• Hands-on and sound knowledge of Python, Pyspark, Java script

• Build data-driven models to understand the characteristics of engineering systems

• Train, tune, validate, and monitor predictive models

• Sound knowledge on Statistics

• Experience in developing data processing tasks using PySpark such as reading,

merging, enrichment, loading of data from external systems to target data destinations

• Working knowledge on Big Data or/and Hadoop environments

• Experience creating CI/CD Pipelines using Jenkins or like tools

• Practiced in eXtreme Programming (XP) disciplines

Data Engineer

at Bungee Tech India

Posted by Abigail David

Remote, NCR (Delhi | Gurgaon | Noida), Chennai

5 - 10 yrs

₹10L - ₹30L / yr

Big Data

Hadoop

Apache Hive

Spark

ETL

+3 more

Company Description

At Bungee Tech, we help retailers and brands meet customers everywhere and, on every occasion, they are in. We believe that accurate, high-quality data matched with compelling market insights empowers retailers and brands to keep their customers at the center of all innovation and value they are delivering.

We provide a clear and complete omnichannel picture of their competitive landscape to retailers and brands. We collect billions of data points every day and multiple times in a day from publicly available sources. Using high-quality extraction, we uncover detailed information on products or services, which we automatically match, and then proactively track for price, promotion, and availability. Plus, anything we do not match helps to identify a new assortment opportunity.

Empowered with this unrivalled intelligence, we unlock compelling analytics and insights that once blended with verified partner data from trusted sources such as Nielsen, paints a complete, consolidated picture of the competitive landscape.

We are looking for a Big Data Engineer who will work on the collecting, storing, processing, and analyzing of huge sets of data. The primary focus will be on choosing optimal solutions to use for these purposes, then maintaining, implementing, and monitoring them.

You will also be responsible for integrating them with the architecture used in the company.

We're working on the future. If you are seeking an environment where you can drive innovation, If you want to apply state-of-the-art software technologies to solve real world problems, If you want the satisfaction of providing visible benefit to end-users in an iterative fast paced environment, this is your opportunity.

Responsibilities

As an experienced member of the team, in this role, you will:

Contribute to evolving the technical direction of analytical Systems and play a critical role their design and development

You will research, design and code, troubleshoot and support. What you create is also what you own.

Develop the next generation of automation tools for monitoring and measuring data quality, with associated user interfaces.

Be able to broaden your technical skills and work in an environment that thrives on creativity, efficient execution, and product innovation.

BASIC QUALIFICATIONS

Bachelor’s degree or higher in an analytical area such as Computer Science, Physics, Mathematics, Statistics, Engineering or similar.
5+ years relevant professional experience in Data Engineering and Business Intelligence
5+ years in with Advanced SQL (analytical functions), ETL, Data Warehousing.
Strong knowledge of data warehousing concepts, including data warehouse technical architectures, infrastructure components, ETL/ ELT and reporting/analytic tools and environments, data structures, data modeling and performance tuning.
Ability to effectively communicate with both business and technical teams.
Excellent coding skills in Java, Python, C++, or equivalent object-oriented programming language
Understanding of relational and non-relational databases and basic SQL
Proficiency with at least one of these scripting languages: Perl / Python / Ruby / shell script

PREFERRED QUALIFICATIONS

Experience with building data pipelines from application databases.
Experience with AWS services - S3, Redshift, Spectrum, EMR, Glue, Athena, ELK etc.
Experience working with Data Lakes.
Experience providing technical leadership and mentor other engineers for the best practices on the data engineering space
Sharp problem solving skills and ability to resolve ambiguous requirements
Experience on working with Big Data
Knowledge and experience on working with Hive and the Hadoop ecosystem
Knowledge of Spark
Experience working with Data Science teams

Company Description

You will also be responsible for integrating them with the architecture used in the company.

Responsibilities

As an experienced member of the team, in this role, you will:

Contribute to evolving the technical direction of analytical Systems and play a critical role their design and development

You will research, design and code, troubleshoot and support. What you create is also what you own.

Develop the next generation of automation tools for monitoring and measuring data quality, with associated user interfaces.

Be able to broaden your technical skills and work in an environment that thrives on creativity, efficient execution, and product innovation.

BASIC QUALIFICATIONS

Bachelor’s degree or higher in an analytical area such as Computer Science, Physics, Mathematics, Statistics, Engineering or similar.
5+ years relevant professional experience in Data Engineering and Business Intelligence
5+ years in with Advanced SQL (analytical functions), ETL, Data Warehousing.
Strong knowledge of data warehousing concepts, including data warehouse technical architectures, infrastructure components, ETL/ ELT and reporting/analytic tools and environments, data structures, data modeling and performance tuning.
Ability to effectively communicate with both business and technical teams.
Excellent coding skills in Java, Python, C++, or equivalent object-oriented programming language
Understanding of relational and non-relational databases and basic SQL
Proficiency with at least one of these scripting languages: Perl / Python / Ruby / shell script

PREFERRED QUALIFICATIONS

Experience with building data pipelines from application databases.
Experience with AWS services - S3, Redshift, Spectrum, EMR, Glue, Athena, ELK etc.
Experience working with Data Lakes.
Experience providing technical leadership and mentor other engineers for the best practices on the data engineering space
Sharp problem solving skills and ability to resolve ambiguous requirements
Experience on working with Big Data
Knowledge and experience on working with Hive and the Hadoop ecosystem
Knowledge of Spark
Experience working with Data Science teams

Data Scientist

at TVS Credit Services

2 recruiters

Posted by Vinodhkumar Panneerselvam

Chennai

4 - 10 yrs

₹10L - ₹20L / yr

Data Science

R Programming

Python

Machine Learning (ML)

Hadoop

+3 more

Job Description: Be responsible for scaling our analytics capability across all internal disciplines and guide our strategic direction in regards to analytics Organize and analyze large, diverse data sets across multiple platforms Identify key insights and leverage them to inform and influence product strategy Technical Interactions with vendor or partners in technical capacity for scope/ approach & deliverables. Develops proof of concept to prove or disprove validity of concept. Working with all parts of the business to identify analytical requirements and formalize an approach for reliable, relevant, accurate, efficientreporting on those requirements Designing and implementing advanced statistical testing for customized problem solving Deliver concise verbal and written explanations of analyses to senior management that elevate findings into strategic recommendations Desired Candidate Profile: MTech / BE / BTech / MSc in CS or Stats or Maths, Operation Research, Statistics, Econometrics or in any quantitative field Experience in using Python, R, SAS Experience in working with large data sets and big data systems (SQL, Hadoop, Hive, etc.) Keen aptitude for large-scale data analysis with a passion for identifying key insights from data Expert working knowledge in various machine learning algorithms such XGBoost, SVM Etc. We are looking candidates from the following: Experience in Unsecured Loans & SME Loans analytics (cards, installment loans) - risk based pricing analytics Experience in Differential pricing / selection analytics (retail, airlines / travel etc). Experience in Digital product companies or Digital eCommerce with Product mindset and experience Experience in Fraud / Risk from Banks, NBFC / Fintech / Credit Bureau Experience in Online media with knowledge of media, online ads & sales (agencies) - Knowledge of DMP, DFP, Adobe/Omniture tools, Cloud Experience in Consumer Durable Loans lending companies (Experience in Credit Cards, Personal Loan - optional) Experience in Tractor Loans lending companies (Experience in Farm) Experience in Recovery, Collections analytics Experience in Marketing Analytics with Digital Marketing, Market Mix modelling, Advertising Technology

Big Data Developer / Lead / Architect

Telecom Client

Agency job

via Eurka IT SOL by Srikanth a

Chennai

5 - 13 yrs

₹9L - ₹28L / yr

PySpark

Data engineering

Big Data

Hadoop

Spark

+6 more

Demonstrable experience owning and developing big data solutions, using Hadoop, Hive/Hbase, Spark, Databricks, ETL/ELT for 5+ years

· 10+ years of Information Technology experience, preferably with Telecom / wireless service providers.

· Experience in designing data solution following Agile practices (SAFe methodology); designing for testability, deployability and releaseability; rapid prototyping, data modeling, and decentralized innovation

DataOps mindset: allowing the architecture of a system to evolve continuously over time, while simultaneously supporting the needs of current users
Create and maintain Architectural Runway, and Non-Functional Requirements.
Design for Continuous Delivery Pipeline (CI/CD data pipeline) and enables Built-in Quality & Security from the start.

· To be able to demonstrate an understanding and ideally use of, at least one recognised architecture framework or standard e.g. TOGAF, Zachman Architecture Framework etc

· The ability to apply data, research, and professional judgment and experience to ensure our products are making the biggest difference to consumers

· Demonstrated ability to work collaboratively

· Excellent written, verbal and social skills - You will be interacting with all types of people (user experience designers, developers, managers, marketers, etc.)

· Ability to work in a fast paced, multiple project environment on an independent basis and with minimal supervision

· Technologies: .NET, AWS, Azure; Azure Synapse, Nifi, RDS, Apache Kafka, Azure Data bricks, Azure datalake storage, Power BI, Reporting Analytics, QlickView, SQL on-prem Datawarehouse; BSS, OSS & Enterprise Support Systems

Demonstrable experience owning and developing big data solutions, using Hadoop, Hive/Hbase, Spark, Databricks, ETL/ELT for 5+ years

· 10+ years of Information Technology experience, preferably with Telecom / wireless service providers.

DataOps mindset: allowing the architecture of a system to evolve continuously over time, while simultaneously supporting the needs of current users
Create and maintain Architectural Runway, and Non-Functional Requirements.
Design for Continuous Delivery Pipeline (CI/CD data pipeline) and enables Built-in Quality & Security from the start.

· To be able to demonstrate an understanding and ideally use of, at least one recognised architecture framework or standard e.g. TOGAF, Zachman Architecture Framework etc

· The ability to apply data, research, and professional judgment and experience to ensure our products are making the biggest difference to consumers

· Demonstrated ability to work collaboratively

· Excellent written, verbal and social skills - You will be interacting with all types of people (user experience designers, developers, managers, marketers, etc.)

· Ability to work in a fast paced, multiple project environment on an independent basis and with minimal supervision

Big Data Engineer

at netmedscom

3 recruiters

Posted by Vijay Hemnath

Chennai

2 - 5 yrs

₹6L - ₹25L / yr

Big Data

Hadoop

Apache Hive

Scala

Spark

+12 more

We are looking for an outstanding Big Data Engineer with experience setting up and maintaining Data Warehouse and Data Lakes for an Organization. This role would closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.

Roles and Responsibilities:

Develop and maintain scalable data pipelines and build out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using 'Big Data' technologies.
Develop programs in Scala and Python as part of data cleaning and processing.
Assemble large, complex data sets that meet functional / non-functional business requirements and fostering data-driven decision making across the organization.
Responsible to design and develop distributed, high volume, high velocity multi-threaded event processing systems.
Implement processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
Perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
Provide high operational excellence guaranteeing high availability and platform stability.
Closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.

Skills:

Experience with Big Data pipeline, Big Data analytics, Data warehousing.
Experience with SQL/No-SQL, schema design and dimensional data modeling.
Strong understanding of Hadoop Architecture, HDFS ecosystem and eexperience with Big Data technology stack such as HBase, Hadoop, Hive, MapReduce.
Experience in designing systems that process structured as well as unstructured data at large scale.
Experience in AWS/Spark/Java/Scala/Python development.
Should have Strong skills in PySpark (Python & SPARK). Ability to create, manage and manipulate Spark Dataframes. Expertise in Spark query tuning and performance optimization.
Experience in developing efficient software code/frameworks for multiple use cases leveraging Python and big data technologies.
Prior exposure to streaming data sources such as Kafka.
Should have knowledge on Shell Scripting and Python scripting.
High proficiency in database skills (e.g., Complex SQL), for data preparation, cleaning, and data wrangling/munging, with the ability to write advanced queries and create stored procedures.
Experience with NoSQL databases such as Cassandra / MongoDB.
Solid experience in all phases of Software Development Lifecycle - plan, design, develop, test, release, maintain and support, decommission.
Experience with DevOps tools (GitHub, Travis CI, and JIRA) and methodologies (Lean, Agile, Scrum, Test Driven Development).
Experience building and deploying applications on on-premise and cloud-based infrastructure.
Having a good understanding of machine learning landscape and concepts.

Qualifications and Experience:

Engineering and post graduate candidates, preferably in Computer Science, from premier institutions with proven work experience as a Big Data Engineer or a similar role for 3-5 years.

Certifications:

Good to have at least one of the Certifications listed here:

AZ 900 - Azure Fundamentals

DP 200, DP 201, DP 203, AZ 204 - Data Engineering

AZ 400 - Devops Certification

Roles and Responsibilities:

Develop and maintain scalable data pipelines and build out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using 'Big Data' technologies.
Develop programs in Scala and Python as part of data cleaning and processing.
Assemble large, complex data sets that meet functional / non-functional business requirements and fostering data-driven decision making across the organization.
Responsible to design and develop distributed, high volume, high velocity multi-threaded event processing systems.
Implement processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
Perform root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.
Provide high operational excellence guaranteeing high availability and platform stability.
Closely collaborate with the Data Science team and assist the team build and deploy machine learning and deep learning models on big data analytics platforms.

Skills:

Experience with Big Data pipeline, Big Data analytics, Data warehousing.
Experience with SQL/No-SQL, schema design and dimensional data modeling.
Strong understanding of Hadoop Architecture, HDFS ecosystem and eexperience with Big Data technology stack such as HBase, Hadoop, Hive, MapReduce.
Experience in designing systems that process structured as well as unstructured data at large scale.
Experience in AWS/Spark/Java/Scala/Python development.
Should have Strong skills in PySpark (Python & SPARK). Ability to create, manage and manipulate Spark Dataframes. Expertise in Spark query tuning and performance optimization.
Experience in developing efficient software code/frameworks for multiple use cases leveraging Python and big data technologies.
Prior exposure to streaming data sources such as Kafka.
Should have knowledge on Shell Scripting and Python scripting.
High proficiency in database skills (e.g., Complex SQL), for data preparation, cleaning, and data wrangling/munging, with the ability to write advanced queries and create stored procedures.
Experience with NoSQL databases such as Cassandra / MongoDB.
Solid experience in all phases of Software Development Lifecycle - plan, design, develop, test, release, maintain and support, decommission.
Experience with DevOps tools (GitHub, Travis CI, and JIRA) and methodologies (Lean, Agile, Scrum, Test Driven Development).
Experience building and deploying applications on on-premise and cloud-based infrastructure.
Having a good understanding of machine learning landscape and concepts.

Qualifications and Experience:

Engineering and post graduate candidates, preferably in Computer Science, from premier institutions with proven work experience as a Big Data Engineer or a similar role for 3-5 years.

Certifications:

Good to have at least one of the Certifications listed here:

AZ 900 - Azure Fundamentals

DP 200, DP 201, DP 203, AZ 204 - Data Engineering

AZ 400 - Devops Certification

Lead Data Engineer

at Lymbyc

1 video

2 recruiters

Posted by Venky Thiriveedhi

Bengaluru (Bangalore), Chennai

4 - 8 yrs

₹9L - ₹14L / yr

Apache Spark

Apache Kafka

Druid Database

Big Data

Apache Sqoop

+5 more

Key skill set : Apache NiFi, Kafka Connect (Confluent), Sqoop, Kylo, Spark, Druid, Presto, RESTful services, Lambda / Kappa architectures Responsibilities : - Build a scalable, reliable, operable and performant big data platform for both streaming and batch analytics - Design and implement data aggregation, cleansing and transformation layers Skills : - Around 4+ years of hands-on experience designing and operating large data platforms - Experience in Big data Ingestion, Transformation and stream/batch processing technologies using Apache NiFi, Apache Kafka, Kafka Connect (Confluent), Sqoop, Spark, Storm, Hive etc; - Experience in designing and building streaming data platforms in Lambda, Kappa architectures - Should have working experience in one of NoSQL, OLAP data stores like Druid, Cassandra, Elasticsearch, Pinot etc; - Experience in one of data warehousing tools like RedShift, BigQuery, Azure SQL Data Warehouse - Exposure to other Data Ingestion, Data Lake and querying frameworks like Marmaray, Kylo, Drill, Presto - Experience in designing and consuming microservices - Exposure to security and governance tools like Apache Ranger, Apache Atlas - Any contributions to open source projects a plus - Experience in performance benchmarks will be a plus

Data Engineer

at Mobile Programming LLC

1 video

34 recruiters

Posted by vandana chauhan

Remote, Chennai

3 - 7 yrs

₹12L - ₹18L / yr

Big Data

Amazon Web Services (AWS)

Hadoop

SQL

Python

+5 more

Position: Data Engineer
Location: Chennai- Guindy Industrial Estate
Duration: Full time role
Company: Mobile Programming (https://www.mobileprogramming.com/" target="_blank">https://www.mobileprogramming.com/)
Client Name: Samsung

We are looking for a Data Engineer to join our growing team of analytics experts. The hire will be
responsible for expanding and optimizing our data and data pipeline architecture, as well as optimizing
data flow and collection for cross functional teams. The ideal candidate is an experienced data pipeline
builder and data wrangler who enjoy optimizing data systems and building them from the ground up.
The Data Engineer will support our software developers, database architects, data analysts and data
scientists on data initiatives and will ensure optimal data delivery architecture is consistent throughout
ongoing projects. They must be self-directed and comfortable supporting the data needs of multiple
teams, systems and products.

Responsibilities for Data Engineer
 Create and maintain optimal data pipeline architecture,
 Assemble large, complex data sets that meet functional / non-functional business requirements.
 Identify, design, and implement internal process improvements: automating manual processes,
optimizing data delivery, re-designing infrastructure for greater scalability, etc.
 Build the infrastructure required for optimal extraction, transformation, and loading of data
from a wide variety of data sources using SQL and AWS big data technologies.
 Build analytics tools that utilize the data pipeline to provide actionable insights into customer
acquisition, operational efficiency and other key business performance metrics.
 Work with stakeholders including the Executive, Product, Data and Design teams to assist with
data-related technical issues and support their data infrastructure needs.
 Create data tools for analytics and data scientist team members that assist them in building and
optimizing our product into an innovative industry leader.
 Work with data and analytics experts to strive for greater functionality in our data systems.

Qualifications for Data Engineer
 Experience building and optimizing big data ETL pipelines, architectures and data sets.
 Advanced working SQL knowledge and experience working with relational databases, query
authoring (SQL) as well as working familiarity with a variety of databases.
 Experience performing root cause analysis on internal and external data and processes to
answer specific business questions and identify opportunities for improvement.
 Strong analytic skills related to working with unstructured datasets.
 Build processes supporting data transformation, data structures, metadata, dependency and
workload management.
 A successful history of manipulating, processing and extracting value from large disconnected
datasets.

 Working knowledge of message queuing, stream processing and highly scalable ‘big data’ data
stores.
 Strong project management and organizational skills.
 Experience supporting and working with cross-functional teams in a dynamic environment.

We are looking for a candidate with 3-6 years of experience in a Data Engineer role, who has
attained a Graduate degree in Computer Science, Statistics, Informatics, Information Systems or another quantitative field. They should also have experience using the following software/tools:
 Experience with big data tools: Spark, Kafka, HBase, Hive etc.
 Experience with relational SQL and NoSQL databases
 Experience with AWS cloud services: EC2, EMR, RDS, Redshift
 Experience with stream-processing systems: Storm, Spark-Streaming, etc.
 Experience with object-oriented/object function scripting languages: Python, Java, Scala, etc.

Skills: Big Data, AWS, Hive, Spark, Python, SQL

Get to hear about interesting companies hiring right now

Follow Cutshort

Why apply via Cutshort?

Connect with actual hiring teams and get their fast response. No spam.

Find more jobs

Get to hear about interesting companies hiring right now

Follow Cutshort