How to insert a pandas DataFrame to an existing PostgreSQL table?
Last Updated :
22 Nov, 2021
In this article, we are going to see how to insert a pandas DataFrame to an existing PostgreSQL table.
Modules needed
- pandas: Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.
- psycopg2: PostgreSQL is a powerful, open source object-relational database system. PostgreSQL runs on all major operating systems. PostgreSQL follows ACID property of DataBase system and has the support of triggers, updatable views and materialized views, foreign keys.
- sqlalchemy: SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL
we start the code by importing packages and creating a connection string of the format:
'postgres://user:password@host/database'
The create_engine() function takes the connection string as an argument and forms a connection to the PostgreSQL database, after connecting we create a dictionary, and further convert it into a dataframe using the method pandas.DataFrame() method.
The to_sql() method is used to insert a pandas data frame into the Postgresql table. Finally, we execute commands using the execute() method to execute our SQL commands and fetchall() method to fetch the records.
df.to_sql('data', con=conn, if_exists='replace', index=False)
arguments are:
- name of the table
- connection
- if_exists : if the table already exists the function we want to apply . ex: 'append' help us add data instead of replacing the data.
- index : True or False
Example 1:
Insert a pandas DataFrame to an existing PostgreSQL table using sqlalchemy. The create table command used to create a table in the PostgreSQL database in the following example is:
create table data( Name varchar, Age bigint);
Code:
Python3
import psycopg2
import pandas as pd
from sqlalchemy import create_engine
conn_string = 'postgres://user:password@host/data1'
db = create_engine(conn_string)
conn = db.connect()
# our dataframe
data = {'Name': ['Tom', 'dick', 'harry'],
'Age': [22, 21, 24]}
# Create DataFrame
df = pd.DataFrame(data)
df.to_sql('data', con=conn, if_exists='replace',
index=False)
conn = psycopg2.connect(conn_string
)
conn.autocommit = True
cursor = conn.cursor()
sql1 = '''select * from data;'''
cursor.execute(sql1)
for i in cursor.fetchall():
print(i)
# conn.commit()
conn.close()
Output:
('Tom', 22)
('dick', 21)
('harry', 24)
Output in PostgreSQL:
output table in PostgreSQL
Example 2:
Insert a pandas DataFrame to an existing PostgreSQL table without using sqlalchemy. As usual, we form a connection to PostgreSQL using the connect() command and execute the execute_values() method, where there's the 'insert' SQL command is executed. a try-except clause is included to make sure the errors are caught if any.
To view or download the CSV file used in the below program: click here.
The create table command used to create a table in the PostgreSQL database in the following example is :
create table fossil_fuels_c02(year int, country varchar,total int,solidfuel int, liquidfuel int,gasfuel int,cement int,gasflaring int,percapita int,bunkerfuels int);
Code:
Python3
import psycopg2
import numpy as np
import psycopg2.extras as extras
import pandas as pd
def execute_values(conn, df, table):
tuples = [tuple(x) for x in df.to_numpy()]
cols = ','.join(list(df.columns))
# SQL query to execute
query = "INSERT INTO %s(%s) VALUES %%s" % (table, cols)
cursor = conn.cursor()
try:
extras.execute_values(cursor, query, tuples)
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
conn.rollback()
cursor.close()
return 1
print("the dataframe is inserted")
cursor.close()
conn = psycopg2.connect(
database="ENVIRONMENT_DATABASE", user='postgres', password='pass', host='127.0.0.1', port='5432'
)
df = pd.read_csv('fossilfuels.csv')
execute_values(conn, df, 'fossil_fuels_c02')
Output:
the dataframe is inserted
after inserting the dataFrame
Similar Reads
How to write Pandas DataFrame to PostgreSQL table?
In this article, we will be looking at some methods to write Pandas dataframes to PostgreSQL tables in the Python. Method 1: Using to_sql() function to_sql function is used to write the given dataframe to a SQL database. Syntax df.to_sql('data', con=conn, if_exists='replace', index=False) Parameter
3 min read
How to add one row in existing Pandas DataFrame?
Adding rows to a Pandas DataFrame is a common task in data manipulation and can be achieved using methods like loc[], and concat(). Method 1. Using loc[] - By Specifying its Index and ValuesThe loc[] method is ideal for directly modifying an existing DataFrame, making it more memory-efficient compar
4 min read
How to Move a Column to First Position in Pandas DataFrame?
Moving a column to the first position in a Pandas DataFrame means changing the column order so that the column you want appears first. For example, if you have a DataFrame with columns ['Age', 'Name', 'City'] and you want to move the 'Name' column to the front, the result will be ['Name', 'Age', 'Ci
3 min read
How to Add an Identity to an Existing Column in PostgreSQL?
PostgreSQL, a robust open-source relational database management system, offers a variety of tools for managing and organizing data. One such feature is the ability to add an identity to an existing column, which is particularly useful in situations when each row requires a unique identifier. In this
4 min read
How to Append Pandas DataFrame to Existing CSV File?
In this discussion, we'll explore the process of appending a Pandas DataFrame to an existing CSV file using Python. Add Pandas DataFrame to an Existing CSV File. To achieve this, we can utilize the to_csv() function in Pandas with the 'a' parameter to write the DataFrame to the CSV file in append mo
3 min read
How to Fix an "Error When Adding a New Row to My Existing DataFrame in Pandas"
Pandas is a powerful and widely-used library in Python for data manipulation and analysis. One common task when working with data is adding new rows to an existing DataFrame. However, users often encounter errors during this process. This article will explore common errors that arise when adding new
6 min read
Insert row at given position in Pandas Dataframe
Inserting a row in Pandas DataFrame is a very straight forward process and we have already discussed approaches in how insert rows at the start of the Dataframe. Now, let's discuss the ways in which we can insert a row at any position in the dataframe having integer based index.Solution #1 : There d
3 min read
How to add metadata to a DataFrame or Series with Pandas in Python?
Metadata, also known as data about the data. Metadata can give us data description, summary, storage in memory, and datatype of that particular data. We are going to display and create metadata. Scenario: We can get metadata simply by using info() commandWe can add metadata to the existing data and
3 min read
How to add Empty Column to Dataframe in Pandas?
In Pandas we add empty columns to a DataFrame to create placeholders for future data or handle missing values. We can assign empty columns using different methods depending on the type of placeholder value we want. In this article, we will see different methods to add empty columns and how each one
2 min read
How To Convert Sklearn Dataset To Pandas Dataframe In Python
In this article, we look at how to convert sklearn dataset to a pandas dataframe in Python. Sklearn and pandas are python libraries that are used widely for data science and machine learning operations. Pandas is majorly focused on data processing, manipulation, cleaning, and visualization whereas s
3 min read