Python Pandas Dataset Analysis: Sorting, Subsetting, Unique Elements, Value Counts, and beyond!

Python Pandas Dataset Analysis: Sorting, Subsetting, Unique Elements, Value Counts, and beyond!

Saniya Khullar

2 года назад

160 Просмотров

Using Python's Pandas package (Free!) to better understand a dataset. Saniya will be covering how to load in pandas, read in a dataset to a jupyter notebook, and do other key pandas dataset operations, such as: sorting (based on 1 or more columns), subsetting (selecting certain rows based on criteria), looking for unique elements in pandas columns, finding value counts, and beyond! This is an initial exploration into working with pandas to better understand data. Saniya also works with a deforestation dataframe "annual-change-forest-area.csv" from Kaggle ( https://www.kaggle.com/chiticariucristian/deforestation-and-forest-loss) in support of efforts to improving climate change outcomes and reducing deforestation (So the real pandas and other critters can have their forest homes preserved!).
Saniya talks a little bit about how to get datasets to practice on (for learning or for competitions on crowd-sourcing sites like kaggle.com)
Please reach out to Saniya with any and all questions you have and please subscribe to Saniya's YouTube channel for more updates :)

In short, we will learn how to use Python's Pandas to better understand deforestation datasets (so we can eventually help protect Panda homes)). Please note this is for Python 3.

Please note Saniya plans to hopefully make more Python videos. Here, we learn Python Pandas tools like:
* import pandas as pd (nickname for pandas)
* import numpy as np
* reading in a dataset (csv file) to pandas
* get # of columns and rows for dataset
* view first 5 rows (head of dataset)
* sort dataframes based on columns
* extract columns from a dataframe
* look for and retrieve all unique elements in columns
* finding value counts (# of times each item appears in a pandas column)
* subsetting dataframe based on criteria ( relational operators and .isin()).
* and beyond!

TIME STAMPS
00:00 Python Pandas Dataset Analysis: Sorting, Subsetting, Unique Elements, Value Counts, and beyond!
01:14 What is deforestation (explaining context for dataset)
02:05 What is Pandas package in Python?
02:44 Kaggle Dataset on Deforestation and Forest Loss (used for dataframe in this video)
03:11 Loading the dataset into Excel (dataframes are standard rows and columns of data)
03:49 Only 1,048,576 rows can be loaded into Excel (Pandas can load in millions of rows!)
04:38 Explaining more about the dataset
05:20 Public Service Announcement: Why Deforestation and Forest Loss are Big Concerns
06:52 Loading up Python Jupyter notebook and datafiles (importing pandas and numpy)
08:18 Converting Pandas Dataframe column (e.g. Year column) to a list
09:14 finding the dimensions (shape) of a dataframe (# rows, # columns)
10:03 finding the unique elements in a pandas column (e.g. Year column, Entity column) using sets
13:37 showing first 5 rows of dataframe (default) using head function
13:58 sorting dataframe by a column (e.g. Year)
17:03 sorting a list using sorted function
17:47 subset/filter dataframe for certain rows for 1 value (e.g. for specific country)
19:05 using value counts to find breakdown of counts of unique values for a given column
19:53 subset/filter dataframe for certain rows for many values (e.g. list of countries)
22:16 recap on index numbers for a Python list (positive and negative)
24:19 explaining the .isin() function
29:00 subsetting/filtering for net forest conversion and year using greater than or less than operators
30:43 selecting only certain columns from pandas dataframe

Тэги:

#python #pandas #pandas_python #dataframe #subsetting #value_counts #isin #sorting #ordering #pandas_data_analysis #climate_change #deforestation_dataset #understanding_data_in_python #python_for_beginners #pandas_for_beginners #unique_items_python #read_csv_pandas #value_counts_dataframe #subset_dataframe #pandas_operations #import_pandas_as_pd #import_numpy_as_np #getting_started_with_python #getting_started_with_pandas #kaggle_data #public_data #kaggle_python #python_notebook
Ссылки и html тэги не поддерживаются


Комментарии: