Machine Learning

Scott
3 min readMar 31, 2020

--

Programming if I give an input I will give an output.

Train model on previous data predict output of future inputs.

Cleaning data or data wrangling means converting ascii to integers.

You have to convert character values, garbage values and null values to integers.

Garbage values are outliers, impossible numbers and mistakes.

Null values are blank, or not given data points.

Some Libraries

Pandas → pd

  • imports a database file, reading a csv, and cleaning the data.

MatplotLib → plt

NumpyLib → np

  • Helps find mean max min standard deviation for matrices.

You can write with the full name or the short hand. But the shorthand is standard.

pandas.DataFrame("", ::)
pd.DataFrame("",::)

You can search it in google that way too.

You can import library by calling import.

import pandas as pd
import math as m

This renames the library and imports it in the same line.

In Anaconda you can play your code and then access variables in the terminal.

df = pd.DataFrame(a1)df1 = pd.read_csv("Location of file")
# go to the file location, copy it and paste it and after the path, give the name.csv
For example:
df1 = pd.read_csv("/Users/scottlydon/Desktop/currentTransaction_8789.csv")

If your data set doesn’t have any column name, then it will automatically set the first row as the column header.

If you don’t want that to happen

df = pd.read_csv("path", header=None)

If you want to assign column names dynamically

df = pd.read_csv("path", header=None, names=["index1", "TV"])

If you give too few column names, they will be assigned left to right.

If you don’t want all the rows

df = pd.read_csv("path", nrows=100)

If you want to skip rows

df = pd.read_csv("path", skiprows=3)

You can set your path here and then use shorter paths when you call read_csv() in anaconda.

If you are trying to read an excel file you can type read_excel()

You can save your data file (write) to your desktop or another path

df.to_csv("example.csv")

You assigned a chart to df. It saves the file to the path that you globally set in the top right of Spyder.

This will save the csv to the desktop and name it example.csv.

If you get this error:

You can put r in the parameter call to reliably reverse the slashes to the requisite style.

df.to_csv(r"path")

If you don’t want the indexes to become a column, then you should set index=False

df.to_csv("path", index=False)

So your path will be based off of the path where your script lives, you need to provide a complete path from /Users/.

df.to_csv("/Users/scottlydon/Documents/Exampleyyyyyy.csv", index=False)df.head(10) # shows first 10, defaults to 5
df.tail() # shows the last 5
df.shape # (number of rows, columns)
df.shape[0] # number of rows
df.shape[1] # number of columns
df.columns # column names in list ["head1", "head2", etc...]
df.dtypes # column name with type listed.
df # will give the whole chart
df['columnName'] # will give the specified column, and column information at the end.
df[['column1', 'column2']] # this will give for two columns, note the matrix. The index only takes one value, so in order to give multiple you have to give a list
df['radio'].dtypes # gives the type of the column
df.radio # this is available as well, we usually don't reference this. We don't usually use this because it doesn't work for column names that have spaces.
df.drop("radio", axis=1) # 1 represents column, 0 represents row

So if you call drop it returns the new chart. It doesn’t manipulate the current chart.

df.drop("radio", axis=1) # will not update df
df = df.drop("radio", axis=1) # will update df

--

--

No responses yet