How to insert a pandas DataFrame to an existing PostgreSQL table?

Last Updated : 22 Nov, 2021

In this article, we are going to see how to insert a pandas DataFrame to an existing PostgreSQL table.

Modules needed

pandas: Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.
psycopg2: PostgreSQL is a powerful, open source object-relational database system. PostgreSQL runs on all major operating systems. PostgreSQL follows ACID property of DataBase system and has the support of triggers, updatable views and materialized views, foreign keys.
sqlalchemy: SQLAlchemy is the Python SQL toolkit and Object Relational Mapper that gives application developers the full power and flexibility of SQL

we start the code by importing packages and creating a connection string of the format:

'postgres://user:password@host/database'

The create_engine() function takes the connection string as an argument and forms a connection to the PostgreSQL database, after connecting we create a dictionary, and further convert it into a dataframe using the method pandas.DataFrame() method.

The to_sql() method is used to insert a pandas data frame into the Postgresql table. Finally, we execute commands using the execute() method to execute our SQL commands and fetchall() method to fetch the records.

df.to_sql('data', con=conn, if_exists='replace', index=False)

arguments are:

name of the table
connection
if_exists : if the table already exists the function we want to apply . ex: 'append' help us add data instead of replacing the data.
index : True or False

Example 1:

Insert a pandas DataFrame to an existing PostgreSQL table using sqlalchemy. The create table command used to create a table in the PostgreSQL database in the following example is:

create table data( Name varchar, Age bigint);

Code:

Python3

import psycopg2
import pandas as pd
from sqlalchemy import create_engine


conn_string = 'postgres://user:password@host/data1'

db = create_engine(conn_string)
conn = db.connect()


# our dataframe
data = {'Name': ['Tom', 'dick', 'harry'],
        'Age': [22, 21, 24]}

# Create DataFrame
df = pd.DataFrame(data)
df.to_sql('data', con=conn, if_exists='replace',
          index=False)
conn = psycopg2.connect(conn_string
                        )
conn.autocommit = True
cursor = conn.cursor()

sql1 = '''select * from data;'''
cursor.execute(sql1)
for i in cursor.fetchall():
    print(i)

# conn.commit()
conn.close()

Output:

('Tom', 22)
('dick', 21)
('harry', 24)

Output in PostgreSQL:

Example 2:

Insert a pandas DataFrame to an existing PostgreSQL table without using sqlalchemy. As usual, we form a connection to PostgreSQL using the connect() command and execute the execute_values() method, where there's the 'insert' SQL command is executed. a try-except clause is included to make sure the errors are caught if any.

To view or download the CSV file used in the below program: click here.

The create table command used to create a table in the PostgreSQL database in the following example is :

create table fossil_fuels_c02(year int, country varchar,total int,solidfuel int, liquidfuel int,gasfuel int,cement int,gasflaring int,percapita int,bunkerfuels int);

Code:

Python3

import psycopg2
import numpy as np
import psycopg2.extras as extras
import pandas as pd


def execute_values(conn, df, table):

    tuples = [tuple(x) for x in df.to_numpy()]

    cols = ','.join(list(df.columns))
    # SQL query to execute
    query = "INSERT INTO %s(%s) VALUES %%s" % (table, cols)
    cursor = conn.cursor()
    try:
        extras.execute_values(cursor, query, tuples)
        conn.commit()
    except (Exception, psycopg2.DatabaseError) as error:
        print("Error: %s" % error)
        conn.rollback()
        cursor.close()
        return 1
    print("the dataframe is inserted")
    cursor.close()


conn = psycopg2.connect(
    database="ENVIRONMENT_DATABASE", user='postgres', password='pass', host='127.0.0.1', port='5432'
)

df = pd.read_csv('fossilfuels.csv')

execute_values(conn, df, 'fossil_fuels_c02')