New Batches

Data Masters Program

One Program 3 courses
Data Science | Big Data | Data Analysis
Course Preview

Roles for Data Masters Courses

  • Data Scientist
  • Data Engineer
  • Data Architect
  • Big Data Engineer
  • Machine Learning Engineer

How Data Masters Program Works

Introduction to Programming with Python

Python is a versatile language which plays a major role in Data Science. In this section, you’ll learn the basics of Python, its philosophy, key fundamental concepts and features.

Database programming with Python

Understanding of a relational database is a must for every Data scientist. This section covers MySQL a popular relational database and introduces database programming using Python.

Data Analysis with Tableau

The final step in the data science pipeline is to communicate the results or findings. In this section, you’ll explore communication and visualization concepts needed by data scientists.

Introduction to Big Data and HDFS

Companies have huge amounts of data to be processed and Big Data plays a vital role in introducing tools and techniques to process large volumes of structured and unstructured data. In this section you’ll get introduced to a popular Big Data file system called Hadoop.

Data Ingestion with Sqoop, Flume, Kafka

Companies have huge volumes of data lying in silos, It becomes very important for a Data Engineer to understand the ways to ingest data from various sources and create data lakes for Big Data Analysis. In this section, we will cover various data ingestion tools like Sqoop, Flume and Kafka to perform data ingestion.

Big Data Processing Using Map Reduce

MapReduce is a processing technique and a program model for distributed computing. In this section we will understand the basics of Map Reduce programming paradigm and the way it has revolutionized Big Data processing using distributed computing.

Big Data Processing Using Apache Spark

Apache Spark is an open source cluster computing framework which uses in-memory primitives to provide performance up to 100 times faster. In this section we will discover Real time data processing using Apache Spark.

Data warehousing using Hive

Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summarization, query, and analysis. This section covers various data processing and analysis techniques using Apache Hive.

Foundations of Data Science

Discover the importance of Data science and understand different application areas of Data Science. Python has various libraries such as NumPy, SciPy, pandas, matplotlib, scikit-learn are extensively used in Data science field. This section provides a deep dive in understanding how we can use Python for solving various Data Science problems.

Statistics of Data Science

Python is widely used for computing statistical parameters or building statistical models.In this section, you will be introduced to the estimation of various statistical measures of a data set, simulating random distributions, performing hypothesis testing, and building statistical models using Python tools.

Machine Learning

In machine learning, computers apply statistical learning techniques to automatically identify patterns in data. This section on Machine Learning is a deep dive to Supervised, Unsupervised learning and Gaussian / Naive-Bayes methods. Also you will be exposed to different classification, clustering and regression techniques.

Neural Networks & Deep Learning & Technologies

Deep learning is part of a broader family of machine learning methods based on the layers used in artificial neural networks. In this section, you’ll deep dive in the concepts of Neural Networks, Convolutional Neural Networks, Recurrent Neural Networks, Auto Encoders and many more.

Cloud Computing for Data Science

Cloud computing is massively growing in importance in the IT sector as more and more companies are eschewing traditional IT and moving applications and business processes to the cloud. This section covers detailed information about how to deploy Data Science models on Cloud environments.

DevOps for Data Science

DevOps play a pivotal role in bridging the gap between Development and Operational teams. This section covers key DevOps tools which a Data Scientist need to be aware of for doing their day to day data science work.

Course Curriculum

1

Basics of Python Programming

Motivation & Applicability to various domains ,Installation & Setting up path Input / Output

Keywords and Identifiers , Variables and Data Types

Conditional Statements

Looping , Control flow (along with loops)

Strings and Features

String Manipulation , Functions

Collections/Sequences

Lists and Features

List Functions and Examples

List Comprehension

Tuple and Features

Functions and Methods

Dictionaries

Working with Dictionaries

Sets and Frozen sets

Working with collections

Working with Lists and Tuples

Working with Dictionaries

Working with Sets and Frozen Sets

Functional Programming

Types of functions

Function Arguments

Anonymous functions

Exception Handling

Try & finally clause

User Defined Exceptions

File handling and Modules

File operations Part - 1

File operations Part - 2

Modules , Importing module , Packages

Math , Random and OS module

Object oriented programming with Python

Classes and objects

Inheritance

Polymorphism

Data hiding (Abstraction)

Encapsulation

2

Introduction of MySQL

Connections and Queries

PlSQL / MySQL

Multithreading

Regular expressions

3

Introduction to Big Data

Why Big Data?

Characteristics of Big Data – 4 Vs

Applications of Big Data

Introduction to Hadoop

HDFS – Hadoop Distributed file system

Components of HDFS

HDFS terminology

HDFS Federation

HDFS high availability

Role of zoo keeper

Replica pipeline and network distance algorithm

HDFS Read and Write

Installing Hadoop in Windows/Mac using Cloudera Quickstart VM

4

Introduction to Sqoop

Sqoop Architecture

Sqoop import and Export with Examples

Introduction to Oozie

Oozie workflow

Oozie Action Tags

Oozie Parametrization

Flume – Spooling Directory

Kafka

5

Introduction to Map Reduce Framework

Mapper and Reducer APIs

First Map Reduce program – Word Count

Map Reduce examples – Inverted Index and Titanic Data Analysis

Modes of execution

Job execution in MRV1 VS YARN

Serialization and Deserialization

Writable Classes

6

Loremipusm

Introduction to Spark

Why Spark ?

Applications of Spark

Spark Terminology

RDD

Architecture of Spark

Transformations and Actions

RDD Hierarchy

Lazy Execution

Shared Variables

RDD persistence

Spark SQL – Data Frames , Data Sets and SQL

Realtime streaming with Kafka and Spark Streaming

7

Introduction to Hive

RDBMS VS Hive

Hive DDL : Managed Table VS External Table

Issues with delimiters

Hive Architecture

Partitioning – Static and Dynamic

Bucketing

Dealing JSON data – using JSON SerDe

Hive UDF

Creating Views

File Formats – Avro, Parquet, ORC

Optimizing Techniques

8

Introduction

High level view of Data Science, Artificial Intelligence & Machine Learning

Subtle differences between DS, ML & AI

Approaches to ML

Terms & terminologies of DS

Ideas of Pipe line, implementation cycle

9

Statistics

Measures of Central Tendency (Mean, Median, Mode)

Dispersion (Variance, Standard Deviation)

Types of Distributions

Scatter plot

Box whisker plot

Qualitative ideas of

Statistical sampling & inference

Hypothesis Testing & t-tests

Confidence Intervals

Prerequisites of above ideas(qualitative)

Terms , Terminology & Notions of Linear Algebra Relevant to Data Science including Probability

10

Supervised Learning

Classification methods & respective evaluation

K-NN

Decision Trees

Naive Bayes

Stochastic Gradient Descent

SVM –

Linear

Non linear

Random Forest

XGboost

Logistic regression

Ensemble methods

Combining models

Bagging

Boosting

Voting

Choosing best classification method

Dimensionality reduction (with minimal theory)

PCA (usually projected as part of unsupervised)

LCA

LDA

Model Tuning

K-fold cross validation

Variance bias tradeoff

L1 and L2 norm

Overfit, underfit along with learning curves variance bias sensibility using graphs but not code

Regression

Linear Regression

Variants of Regression

Lasso

Ridge

Multi Linear Regression

Logistic Regression (effectively, classification only)

Regression Model Improvement – Tips and Tricks

Unsupervised learning

Clustering

K means

Hierarchical Clustering

Association Rules (market basket analysis)

Advanced Analytics

Time series

Time series Analysis.

ARIMA example

Recommender Systems

Content Based Recommendation

Collaborative Filtering

Text analytics

Natural Language Processing

Stemming, Lemmatization and Stop word removal.

POS tagging and Named Entity Recognition

Bigrams, Ngrams and colocations

Term Frequency and TF-IDF

11

ANN

NN & terminologies

Non linearity problem, illustration

Perceptron learning

Feed Forward Network and Back propagation

Gradient Descent

Additional relevant Mathematics

Gradients

Partial derivatives

Linear algebra

Li

LD

Eigen vectors

Projections

Vector quantization Overview of

Tensor Flow

Keras

Deep Learning with Convolutional Neural Nets

Architecture of CNN

Types of layers in CNN

filters

Building an Image classifier with and without CNN

Recurrent neural nets

Fundamental notions & ideas

Recurrent neurons

Handling variable length sequences

Training a sequence classifier(ideas)

Training to predict Time series

Reinforcement Learning(overview)

Autoencoders(overview)

12

Tableau Introduction

Understanding the start pane

Connecting to data source

Data sources that can be connected

Various file formats

Bookmarks

TDS, TDE

Connecting to excel, Joins, Splitting data

Live and extract

Dimensions and measures

Clearing sorts and filters

Views: Standard fit width and height

Drilling down

Expanding the marks in pane

Swapping axis

Renaming sheets

Editing color pane

Adding highlighters

Understanding show me

Sorting and hierarchy

Data pane and analytics pane

Different view options at bottom of sheet

Managing data and extracts in Tableau

Hiding and unhiding fields

Creating folders to move dimensions and measures

Adding default colors and properties

Adding multiple data sources

Extracting workbook

Replacing data sources

Data cleansing

Database joins

Blending

Sorting and Filtering in Tableau

Default charts

Highlighter for color and shape

Sorting from axis, color, category, manually and clearing sort

Creating groups from pane, manually, visually, parameters, and bins

Adding filter, show filter, wildcards, Top N parameters

Discrete and continuous dates

Types of filters: Applying to specific sheets, Editing page shelf

Hiding cards

Sets and parameters in Tableau

Sets

Parameters

Tool tips

Cluster analysis

Formatting

Dashboards and storyboards in Tableau

Building dashboards

Hiding and unhiding sheets

Interface between sheets, dashboard and storyboard

Elements in dashboard

Formatting

Actions in dashboard

Device designer

Story points

Charts in Tableau

Word cloud

Bump charts

Box and whisker

Funnel

Step and Line

Pareto

Waterfall

Donut

Lollipop

Pie

Heat map

Waffle

Show me charts

Calculations in Tableau

Basic syntax

Regular calc and table calc

Adding totals

Date calc

Logic calc

String calc

Number calc

LODs

Maps in Tableau

Mapbox

WMS>Layers

Converting geo to non-geo

Chart default

Options for maps

Unrecognized locations

Groups

13

Introduction to Cloud Computing

Amazon Web Services Preliminaries - S3, EC2, RDS

Big data processing on AWS using Elastic Map Reduce (EMR)

Machine Learning using Amazon Sage Maker

Deep Learning on AWS Cloud

Natural Language processing using AWS Lex

Analytics services on AWS Cloud

Data Warehousing on AWS Cloud

Creating Data Pipelines on AWS Cloud

14

Introduction to DevOps for Data Science

Tasks in Data Science Development

Deploying Models in Production

Deploying Machine Learning Models as Services

Running Machine Learning Services in Containers

Scaling ML Services with Kubernetes

send

Download

curriculum

Case Studies

Facebook – Using Data to Revolutionize Social Networking & Advertising

Facebook has become a hub of innovation where it has been using advanced techniques in data science to study user behavior and gain insights to improve their product. Using deep learning, Facebook makes use of facial recognition and text analysis. In facial recognition, Facebook uses powerful neural networks to classify faces in the photographs. It uses its own text understanding engine called “DeepText” to understand user sentences. It also uses Deep Text to understand people’s interest and aligning photographs with texts.

However, more than being a social media platform, Facebook is more of an advertisement corporation. It uses deep learning for targeted advertising.

Amazon – Transforming E-commerce with Data Science

Amazon heavily relies on predictive analytics to increase customer satisfaction. It does so through a personalized recommendation system. This recommendation system is a hybrid type that also involves collaborative filtering which is comprehensive in nature. Amazon analyzes the historical purchases of the user to recommend more products. This also comes through the suggestions that are drawn from the other users who use similar products or provide similar ratings.

Amazon has an anticipatory shipping model that uses big data for predicting the products that are most likely to be purchased by its users. It analyzes the pattern of your purchases and sends products to your nearest warehouse which you may utilize in the future.

Related Course

Data Analysis

pdf-download
enroll_here

Big Data

pdf-download
enroll_here

Data Science

pdf-download
enroll_here

Tableau

pdf-download
enroll_here

Python

pdf-download
enroll_here

Life at Digital Lync

The environment at Digital Lync is colorful and creative. It is where ideas are incubated and generated. An apt place to explore yourself.

Happy Partners

You'll be in good company

Inspiring student stories.

Here are stories of real knowledge, real people, and real innovation.

Have a Question?

we help you go farther than you ever dreamed, 24/7.
+608011244239

Malaysia

LIVE CHAT

LIVE CHAT

Locations

Come and chat with us about your goals over a cup of coffee
Gachibowli-Hyderabad

1st Floor, Plot No: 6-11, survey No., 40 Khajaguda, Naga Hills Rd, Madhura Nagar Colony, Gachibowli Hyderabad, Telangana 500008

Phone: +91 8688444666

Kukatpally-Hyderabad

Address: #106 & 107, Manjeera Trinity Corporate. Near Manjeera Mall, Kukatpally, Hyderabad, Telangana 500072

Phone: +91 8688444666

Malaysia

11, Pusat Dagang Seksyen 16 Seksyen 16, 46350 Petaling Jaya Selangor, Malaysia

Phone: +60 80112 44239

USA

#23664, Richland Grove Dr, Ashburn, VA 20148

Phone: +1-262-997-9000

© Copyright Lync Digital School Pvt. Ltd | 2019 | Privacy Policy

×

Hello!

Click one of our representatives below to chat on WhatsApp or send us an email to hello@digital-lync.com

×