How to build an ETL pipeline with Python | Data pipeline | Export from SQL Server to PostgreSQL

How to build an ETL pipeline with Python | Data pipeline | Export from SQL Server to PostgreSQL

BI Insights Inc

2 года назад

170,929 Просмотров

In this video, we will discuss what ETL is. ETL stands for Extract, Transform, Load. ETL is a set of processes that extracts data from one or more sources (API, a database or a file), transforms it to match the destination system’s requirements and loads it into the destination system.

In this tutorial we will see how to design ETL Pipeline with Python. We will use SQL Server’s AdventureWorks database as a source and load data in PostgreSQL with Python. We often have requirements for data analytics and/or reporting projects that requires optimized data for querying. ETL moves and amalgamate the data from various sources and stores in the destination where it is available for data analytics, reporting. Furthermore, data scientist and data analysts can use this optimized data come up with new findings.

Link to medium article on this topic: https://medium.com/@hnawaz100/how-to-build-an-etl-pipeline-with-python-1b78407c3875

Link to Microsoft ODBC driver: https://www.microsoft.com/en-us/download/details.aspx?id=36434

Link to ETL Script:
https://github.com/hnawaz007/pythondataanalysis/blob/main/ETL%20Pipeline/build_etl_pipeline_python.py

Link to GitHub repo: https://github.com/hnawaz007/pythondataanalysis/tree/main/ETL%20Pipeline

Link to the Python ETL series: https://hnawaz007.github.io/etl.html

How to connect to SQL Server via Python: https://www.youtube.com/watch?v=zdezE6TWSQQ&t



SQL Server install video: https://www.youtube.com/watch?v=e5mvoKuV3xs&t
PostgreSQL Install video: https://www.youtube.com/watch?v=fjYiWXHI7Mo&t

Subscribe to our channel:
https://www.youtube.com/c/HaqNawaz

---------------------------------------------
Follow me on social media!

Github: https://github.com/hnawaz007
Instagram: https://www.instagram.com/bi_insights_inc
LinkedIn: https://www.linkedin.com/in/haq-nawaz/

---------------------------------------------

#ETL #Python #SQL

Topics covered in this video:
0:00 - Introduction to ETL
1:55 - PostgreSQL Setup
2:49 - Sqlserver Setup
3:32 - Code ETL Pipeline
5:10 - Data Extract from SQL Server
7:30 - Load data to PostgreSQL
9:22 - Test ETL Pipeline

Тэги:

#data_pipelines #what_is_data_pipeline #what_is_data_pipeline_architecture #data_pipeline_vs_etl #how_to_build_a_data_pipeline #how_to_create_data_pipeline #batch_vs_stream_processing #big_data #data_warehouse #ETL #How_to_build_ETL_pipeline #build_etl_pipeline_with_Python #SQL_Server #build_etl_pipeline_with_pyton #python_data_integration #Haq_Nawaz #ETL_Incremental_data_load #Export_data_from_SQL_Server_to_Postgres #ETL_with_Python
Ссылки и html тэги не поддерживаются


Комментарии:

Parab
Parab - 14.08.2023 12:46

Thanks very much for your invaluable contribution towards eternal learning process.

Ответить
GB~
GB~ - 07.08.2023 17:11

Hello Haq,

I keep having these 2 errors while executing the .py file in VS:

1."Error Locating Server/Instance Specified [xFFFFFFFF]."
Already stackoverflowed it and changed to Automatic the SQL Browser startup mode. The Service is also running now.

2.Also there's this one: "Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections"

For this 2nd one I did two changes: First, I checked the name of the server with query "SELECT @@SERVERNAME;" and it checks out when I found it in Server Properties in SSMS. Your server name looks quite different from mine ( laptop name). I slightly think my issue comes from my server name somehow.
Secondly, I allowed remote connections under the menu Server Properties in SSMS. I think I'm quite close. Any suggestions?

Hope I presented the question in a neat and understandable way.
In any case, congratulations and many thanks for these series of tutorials, I'm quite close to make it!

Cheers,

NB: I've completed with same results as you the 2 previous videos on SQL Server and Postgre installation, thanks again!

Ответить
Diogo Bastos
Diogo Bastos - 05.07.2023 15:06

txs man!!!

Ответить
ranvijay mehta
ranvijay mehta - 02.07.2023 14:05

Thank you so much

Ответить
Caribou Data Science
Caribou Data Science - 23.06.2023 23:28

Why not use the SQL functions in pandas?

Ответить
Ferruccio Guicciardi (Ferruccio_Guicciardi)
Ferruccio Guicciardi (Ferruccio_Guicciardi) - 16.06.2023 11:02

Thanks for sharing 😀

Ответить
Linh Tuấn
Linh Tuấn - 01.06.2023 20:37

Hi,bro. Video so cool
I have trouble, can you help me. i installed msodbcsql. Tks you so much
Extract error:('IM002', '[IM002] [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified (0) (SQLDriverConnect)')
Data Extract Error: 'src_connect' is not defined

Ответить
Flying Dutchman
Flying Dutchman - 25.05.2023 20:39

Fantastic videos, short and to the point! Great work, thank you for sharing!

Ответить
Nicol Posthumus
Nicol Posthumus - 23.05.2023 11:58

Thanks

Ответить
Nicol Posthumus
Nicol Posthumus - 23.05.2023 11:56

Thank you for this please continue to make these informative videos, you are a very good illustrator

Ответить
Umang Pandya
Umang Pandya - 20.05.2023 22:10

This is cool! But isn't this more of an Extract and Load process than Extract, Transform and Load process? Most of the time while creating the Data Marts transform scripts is a heck and long. Also in the same scenario the Load procedure is mainly for insert query of records than tables. Nevertheless good video!

Ответить
SMC GPRA
SMC GPRA - 16.05.2023 20:21

Sometimes int will auto convert to float in pandas and get error

Ответить
prateek mishra
prateek mishra - 12.05.2023 15:45

Any suggestions or available code which can copy the compatible data type from MySQL to postgres..?? Please respond

Ответить
The Man
The Man - 21.04.2023 20:47

Hey, great video. I was wondering if this would be a good guide for creating a basic pipeline project for job interviews’s like Data Analysis/Engineering? Thank you.

Ответить
Shibly Aziz
Shibly Aziz - 16.04.2023 17:56

Exactly what I was looking for.Very well done! Thank you

Ответить
Shrutika
Shrutika - 06.04.2023 03:27

Data sources are sap hana and sql server. My target table is same sql server. How can I perform upsert. I do have primary key on target table but it might be possible other columns might update in future . Please help

Ответить
Mister M
Mister M - 31.03.2023 11:23

Thanks! Subscribe and like buttons done.

Ответить
Aurav Kar
Aurav Kar - 30.03.2023 08:55

Where is the transform part

Ответить
matheus bispo
matheus bispo - 23.03.2023 23:21

houly shit

Ответить
Graham Lindsay
Graham Lindsay - 21.03.2023 12:44

I'm impressed!

Ответить
HeroicKhalid
HeroicKhalid - 18.03.2023 23:47

I'm still don't see benefit from it why need to make it like this if we can extract from source dirctly

Ответить
Dean Marucci
Dean Marucci - 01.03.2023 22:11

Such an easy to follow video, fantastic tutorial!

Ответить
Avinash
Avinash - 23.02.2023 22:09

What is the ETL tool you used data injection and extraction?

Ответить
Nadeschda t
Nadeschda t - 02.02.2023 22:40

Is this also working if ODBC Driver will be used?

Ответить
balaji sivam
balaji sivam - 27.01.2023 10:49

Hi bro, very informative session.. am working on the ETL QA framework creation using pyspark. for that I have to create a directory structure in pycharm.. is there any reference video you have created? Please share

Ответить
Spirits & Such
Spirits & Such - 18.01.2023 11:43

Hello, I am getting the following error.

Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")
Error while extracting data: local variable 'src_conn' referenced before assignment

driver is installed and I have done the other troubleshooting steps found in the previous comments.

Ответить
Gralle G
Gralle G - 08.01.2023 14:36

Thanks a lot !

Ответить
David
David - 30.12.2022 03:07

great great video

Ответить
Arin Bose
Arin Bose - 13.12.2022 10:13

THIS IS AMAZING THANKS!!!
HELP!!!!

1)I did the same thing and it got importeed into postgres, But i need the column data types to be in specif cformat for example Datetime, VARCHAR, DOUBLE ,INT not these text,bigint etc. Please suggest what should i do!!

2) Data is same but the data types are different, i tried changing the data type in the data frame and created custom columns with CREATE sql query but it doesnt matter as pd.to_sql always replaces that table adn creates a new one.

So what should i do?
Thank you

Ответить
motamarri ajaykumar
motamarri ajaykumar - 10.12.2022 10:17

If my destination is Dimentions and fact table then I need to write separate method for table or is there any other way

Ответить
motamarri ajaykumar
motamarri ajaykumar - 10.12.2022 10:07

Hi sir,
Can you make a video how to migrate data from SQL server to snowflake

Ответить
Duy Nguyen Duc
Duy Nguyen Duc - 27.11.2022 18:21

I encountered this error while running the file: raise KeyError(key) from None at code 'uid' and 'pwd'. Could you help me with this? Also a very nice video! Thank you

Ответить
Ihab
Ihab - 24.11.2022 19:01

hai friend.. I have problem. in the sql server name is ENGINEERDATA\ENGDATA and when I'm declared variable ='ENGINEERDATA\ENGDATA' in code when I run the scripts always appear pycopg2.operationalError couldn't translate host name "engineerdata\ENGDATA" to address :unknow host.
please help me

Ответить
Rangarajan N
Rangarajan N - 25.10.2022 09:17

This is just EL.

Ответить
Ernest Mukaru
Ernest Mukaru - 23.10.2022 12:19

So cool!! where does transform happen?

Ответить
Sanjog SIgdel
Sanjog SIgdel - 22.10.2022 13:26

Helpful. Thank you for this video

Ответить
Noor Ali Ikhwan
Noor Ali Ikhwan - 19.10.2022 06:56

hello sir I got some error and I don't get any clue through it after trying other alternative.

Data extract error: ('28000', "[28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456) (SQLDriverConnect); [28000] [Microsoft][SQL Server Native Client 11.0][SQL Server]Login failed for user 'etl'. (18456)")error
Error while extracting data: local variable 'src_conn' referenced before assignment


I really appreciate your help!!!!

Ответить
Mukul Raj
Mukul Raj - 10.10.2022 19:40

Perfect explanation. 👍👍

Ответить
PS Points Sara
PS Points Sara - 07.10.2022 12:59

Excellent presentation. Keep it up.

Ответить
Oberdan Santos
Oberdan Santos - 01.10.2022 12:41

Explicação muito boa. Vc tem algum tutorial extraindo dados de um ERP?

Ответить
Nobel Siwela
Nobel Siwela - 27.09.2022 14:03

Top tier staff this!!!

Ответить
Bilal Muhammad
Bilal Muhammad - 20.09.2022 20:19

I was looking for this. Thankyou!!. One question about scheduler. If script is run every day at certain time, everytime script ll run the query "select * form table" query? This means whole dataset in database is loaded everytime when script runs?

Ответить
Kine Lekaite
Kine Lekaite - 18.09.2022 17:43

great stuff thanks

Ответить
Nabil Rahmouni
Nabil Rahmouni - 16.09.2022 11:08

amazing Video, I followed all but I had this error and could not find solution
Data extract error: ('08001', '[08001] [Microsoft][ODBC Driver 17 for SQL Server]Neither DSN nor SERVER keyword supplied (0) (SQLDriverConnect); [08001] [Microsoft][ODBC Driver 17 for SQL Server]Invalid connection string attribute (0)')
Error while extracting data: local variable 'src_conn' referenced before assignment

Ответить