Choose your language:

France
Germany
Hong Kong
India
Ireland
Japan
Malaysia
Netherlands
New Zealand
Singapore
Sweden
United Kingdom
United States

Data Analysis Using Python

Course Code

IN1546

Duration

3 Days

To maximize time allotted to core discussions, this course DOES NOT include introductory topics on the Python language or Python fundamentals. It is essential that participants take either an introductory Python course (such as the Intensive Introduction to Python course) or ensure they have proper exposure/experience with this language beforehand. Knowledge of the following topics prior to this course is required:

  • Python data types and control structures including the use of 'with'
  • Creating and calling Python Functions
  • PYTHONPATH, high-level knowledge of the standard library, and pip
  • Familiarity with executing Python Scripts from a command-line or within an IDEs
With the current trend toward the use of Python as a primary development language and the rise of numerous 3rd party tools to support data analysis within Python, this 3-day course provides a solid understanding of the tools and fundamentals of data analysis using Python. This course provides a thorough treatment of Pandas, NumPy, Matplotlib as well as other data analysis-related tools.

Using Pandas, participants will examine techniques for acquiring data from numerous data sources, including the database, Excel, text files, and remote locations. Creation, transformation, manipulation of large data sets to include techniques for working with DataFrames, split-apply-combine, multi-level indexing, regressions, and more are explored. Numerous NumPy methods are examined as well as the creation of data visualizations using Matplotlib and other tools such as Seaborn. Environments such as iPython and Jupyter are utilized.

Finally, techniques for improving performance using threads, asyncio, and other Python extensions are examined.

PLEASE NOTE THE PREREQUISITES BELOW!
This course is designed for data engineers, software developers, software architects, technical managers, sales representatives and anyone looking to improve their Python data analysis skills.

Upon completion of this course, participants will be able to:

  • Leverage 3rd party tools to solve problems
  • Create rich, powerful data visualizations
  • Read large sets of data from different data sources
  • Transform, manipulate, and analyze data sets
  • Incorporate techniques to improve performance and decrease calculation times
Day 1
Data Analysis Tools and NumPy

The Anaconda Python Distribution
Anaconda Packages and conda
iPython and Jupyter Environment
Running Notebooks
Magic Commands
Introducing NumPy and SciPy
The NumPy Array
Understanding dtype, shape, and strides
Creating Arrays
Stacking Techniques
Slicing
Array Generation
Tiling
Exercises: Creating ndarrays

More with NumPy
NumPy Functions
Aggregation Functions
Reshaping
Column and Row Indexing
Understanding Boolean Indexing
Comparing Arrays
Sorting and Iterating Arrays
Reading from Text Files
Using genfromtxt()
Exercise: Sports Statistics Analysis

Day 2
Introducing Matplotlib

Python Data Visualization Overview
Configuring Matplotlib for Jupyter
Creating Plots
Plot Options and Variations
Line Graphs, Bar Graphs
BarH Graphs, Scatter Plots
Pie Charts, TwinX Bar Graphs
Plotting Multiple Data Sets
Using Subplots
Managing Axes
Customizing Layouts
Legends and Annotations
LaTeX Markup
Saving Figures
Exercise: Plotting with Matplotlib

Working with Pandas
Creating Pandas Series
Accessing, Manipulating Series
Plotting with Pandas
Creating DataFrames
DataFrame Info
Indexing DataFrames
loc[], iloc[], ix[]
Sorting DataFrames
read_csv()
Exercise: Using Pandas DataFrames

Day 3
More with Pandas DataFrames

Displaying DataFrames
Merging Data with DataFrames
MultiIndex DataFrames
More with Boolean Indexing
Copying DataFrames
Deleting, Manipulating Columns
Split-Apply-Combine Operations
Using GroupBy
Reading from Excel Spreadsheets
Apply(), Map(), ApplyMap()
Exercise: Grouping and Manipulating Data
Cut and Binning
Cross-Tabulation
Getting Unique Values
Pandas Panels

Pandas, the Database, and Other Sources
Python DB API 2.0
Fetch and Execute Operations
Pandas and the Database
read_sql() Options
Exercise: Working with Pandas and the Database
Using read_json() and read_html()

Other Python Tools for Data Analysis
Seaborn
Bottleneck
Dask
Numexpr

Improving Performance with Python Multi-tasking for Data Analysis
Measuring Performance
Jupyter and %timeit
Context Managers and Measuring Time
Using Threads
Multi-processing vs Threads
Asyncio
Using Cython
Parallel Task Programming using Dask
Send Us a Message
Choose one