About the Program
Data Engineering Tools encompass a suite of technologies and frameworks essential for the effective management, processing, and analysis of extensive datasets. Within this domain, data engineers play a pivotal role, tasked with the design and upkeep of intricate data pipelines, ensuring the reliability and accessibility of critical information. Their expertise is fundamental in enabling data-driven decision-making and bolstering a diverse range of data-intensive applications, ranging from business intelligence platforms to advanced machine learning algorithms. This specialized skill set is highly sought after in today's data-centric landscape, making data engineers invaluable assets in the IT industry.
Courses
-
Credits Semester
- Hadoop Eco System with HDFS and MapReduce 3 III
- Data Processing with Hive and Pig Latin 3 IV
- Complete Python for Data Engineers 3 V
- PySpark for Data Engineers 3 V
- Cloud Data Engineering AWS, GCP, and Azure 3 VI
- Real-Time Data Engineering with Streaming Tools 3 VII
- Project Work - The Data Services Capstone: Exploring Big Data Tools 3 VIII
-
Credits Semester
- IT World Essentials: Your Digital Entrypoint 3 I
- Critical Thinking, Design Thinking, Leadership and Teamwork 3 II
- Project Work - The Data Services Capstone: Exploring Big Data Tools 3 VIII
-
Credits Semester
- Critical Thinking, Design Thinking, Leadership and Teamwork 3 II
- Career Readiness in Digital Era 3 VI
Mode of Delivery
- Self-paced learning – 10 hours
- VILT sessions – 28 hours
- Project work – 7 hours
- Face-to-face instructor led sessions / VILT sessions (including project work) – 45 hours
- Self-paced learning + Expert session – 30 hours
- Project work – 15 hours
Job Roles
- Data Engineer
- Data Integration Engineer
- Big Data Analyst
Software Tools
- Python
- Scala
- Presto
- Hive
- Pig
- Flink
- Zeppelin
- Oozie
- kafka
- Pyspark
- Databricks
- MySQL
- Cassandra
- MongoDB
- Hadoop
- Airflow
- Spark
- Ambari
- Zookeper
- Flume
- Sqoop
Skills
- Designing efficient and scalable data structures for modeling and analysis.
- Working with NoSQL databases like MongoDB and Cassandra for unstructured data.
- Implementing centralized data storage solutions in data warehousing for large datasets.
- Utilizing Hadoop and Spark for big data processing and distributed computing.
- Combining data from multiple sources for seamless integration into a unified system.
- Managing large datasets effectively with Hadoop and Spark ecosystem for performance.
