Course Description

This course focuses on utilizing a popular programming language in the analytics domain with associated tools for the design and construction of analytics applications. Emphasis is on the use of structured techniques and the integration of application libraries for data extraction, model implementation and visualization.

Cheat Sheet

The following document is a cheat sheet for Python 3.

Cheat Sheet

Programs

The following are python programs.

BMI

The following calculates BMI using English units and converts them to metric and prints the resulting BMI.

BMI

Paint

Design a program to estimate the gallons of paint needed to paint a wall and the estimated cost of the paint. Assume that one gallon of paint will cover 350 square feet of a wall, each doorway accounts for 14 square feet, and each window accounts for 8.5 square feet. Obtain all other required information from the user. The output of your program must include the total cost to purchase the paint and the number of gallons needed. Additional output may include the square footage of the wall and allowances made.

Paint

Growth Rate

A biologist needs a program to predict population growth. The inputs are: 1) The initial number of organisms, as an int 2) The rate of growth (a real number greater than 1), as a float 3) The number of hours it takes to achieve this rate, as an int 4) The number of hours during which the population grows, as an int Write a program that takes these inputs and displays a prediction of the total population.

Growth Rate

Text Files Single Item on a Line

Create a program to process the file data.txt Find the following pieces of information about the file:

Total lines in the file Min Max Mean Missing (-1)

Single Item on a Line

The following text file is the data used to test the program.

Data

Text Files Multiple Items on a Line

Construct a program to copy all records with a user entered integer value found in field X4 to a new file.

Multiple Items on a Line

The following csv file is the data used to test the program.

Moxy

TermVector

A term frequency vector is a list of the words found in a document with a count of how many times each word occurs in the document. Create a program to find the term frequency vector of a document stored in a text file.

TermVector

The following text file is the data used to test the program.

TheTortoiseAndTheHare

BMI with Functions

Revamp the BMI program: * Include a main function * Move conversions to functions * Make a function to do the BMI calculation.

BMI

Pandas

Clean and explore the Boone.csv weather data.

Tasks:

  • Convert fields to correct types
  • Drop useless columns
  • Rename columns: Remove underscores and use pascal casing
  • How much data is missing?
  • Look for relationships using corr()
  • Save data to boone_clean.csv

Pandas

The following csv file is the data used.

Boone Weather

Average Year Weather Data

Utilizing the cleaned Boone weather data, create a dataset representing an average year. Impute values as you see fit, but be sure you can justify your decisions.

Average Year Weater

The following csv file is the data used.

Boone Clean

The following csv file is the results for the year average.

Average Year

Visualizations

Using the cleaned data for store 11, create the following visuals in a notebook:

  • Scatter plot Sales and Labor_Cost
  • Line plot Date and Item_Labor
  • Create a histogram for a column of your choice.
  • Create a bar chart using Bad_Weather_Days and Sales

Visualizations

The following csv file is the data used to create the visualizations.

Store 11