How to Extract PDF Tables in Python?
Last Updated :
27 May, 2025
When handling data in PDF files, you may need to extract tables for use in Python programs. PDFs (Portable Document Format) preserve the layout of text, images and tables across platforms, making them ideal for sharing consistent document formats. For example, a PDF might contain a table like:
User_ID | Name | Occupation |
1 | David | Product Manage |
2 | Leo | IT Administrator |
3 | John | Lawyer |
And we want to read this table into our Python Program. This problem can be solved using several approaches. Let's discuss each one by one.
Using pdfplumber
If you want a straightforward way to peek inside your PDF and pull out tables without too much hassle, pdfplumber is a great choice. It carefully looks at each page and finds the tables by understanding the layout, then gives you the rows and columns so you can use them in your program.
Python
import pdfplumber
with pdfplumber.open("example.pdf") as pdf:
for p in pdf.pages:
for t in p.extract_tables():
for r in t:
print(r)
Output
Using pdf.plumberExplanation: This code uses pdfplumber.open() to safely open the PDF, iterates through pages with pdf.pages, extracts tables using extract_tables() and prints each row as a list of cell values for easy readability.
Using camelot
When your PDF has nicely drawn tables with clear lines or spaces, Camelot works wonders. It’s like a smart scanner that spots these tables and turns them into neat data frames you can easily handle in Python. It’s very handy if you want quick and clean results and PDF file used here is PDF.
Python
import camelot
# Read tables
a = camelot.read_pdf("test.pdf")
# Print first table
print(a[0].df)
Output

Explanation: camelot.read_pdf() extract tables from the PDF file "test.pdf". It stores all detected tables in the variable a. The first table (a[0]) is then accessed and its content is printed as a DataFrame using .df .
Using Tabula-py
If you don’t mind installing a bit of Java on your computer, Tabula-py is a powerful helper that uses a popular Java tool behind the scenes. It’s super good at grabbing tables from PDFs, even complex ones, and hands you the data as tidy tables inside Python.
Python
from tabula import read_pdf
from tabulate import tabulate
df = read_pdf("abc.pdf",pages="all") #address of pdf file
print(tabulate(df))
Output


Explanation: This code uses read_pdf() from Tabula-py to extract tables from all pages of "abc.pdf" into a DataFrame df. It then prints the DataFrame in a clean, formatted table style using tabulate().
Using PyMUPDF
Sometimes, tables aren’t perfectly formatted, or you want all the text details, not just tables. PyMuPDF lets you open PDFs and extract all the text, giving you full control. It doesn’t automatically find tables, but if you’re ready to do some manual work, it’s a flexible tool.
Python
import fitz
d = fitz.open("example.pdf")
for p in d:
t = p.get_text("dict")
print(t)
Output
Using PyMUPDFExplanation: This code opens the PDF file "example.pdf" using PyMuPDF (fitz). It loops through each page, extracts the page’s text as a detailed dictionary (get_text("dict")), which includes text blocks, fonts and layout info, then prints this structured text data.
Related articles
Similar Reads
How to make a Table in Python?
Creating a table in Python involves structuring data into rows and columns for clear representation. Tables can be displayed in various formats, including plain text, grids or structured layouts. Python provides multiple ways to generate tables, depending on the complexity and data size.Using Tabula
3 min read
How to extract images from PDF in Python?
The task in this article is to extract images from PDFs and convert them to Image to PDF and PDF to Image in Python.To extract the images from PDF files and save them, we use the PyMuPDF library. First, we would have to install the PyMuPDF library using Pillow.pip install PyMuPDF PillowPyMuPDF is us
3 min read
How to create Tables using Plotly in Python?
Plotly is a Python library that is used to design graphs, especially interactive graphs. It can plot various graphs and charts like histogram, barplot, boxplot, spreadplot, and many more. It is mainly used in data analysis as well as financial analysis. plotly is an interactive visualization library
2 min read
How to Convert Image to PDF in Python?
img2pdf is an open source Python package to convert images to pdf format. It includes another module Pillow which can also be used to enhance image (Brightness, contrast and other things) Use this command to install the packages pip install img2pdf  Below is the implementation: Image can be convert
1 min read
How to add PDF in Tkinter GUI Python ?
In this article, We are going to see how to add a PDF file Tkinter GUI, For that, we don't have a direct widget to do this. For that, We need to have python version 2.7 or more. And you need to install the 'tkPDFViewer' library. This library allows you to embed the PDF file in your Tkinter GUI. Inst
2 min read
How to extract image metadata in Python?
Prerequisites: PIL Metadata stands for data about data. In case of images, metadata means details about the image and its production. Some metadata is generated automatically by the capturing device. Some details contained by image metadata is as follows: HeightWidthDate and TimeModel etc. Python h
2 min read
Convert CSV to HTML Table in Python
CSV file is a Comma Separated Value file that uses a comma to separate values. It is basically used for exchanging data between different applications. In this, individual rows are separated by a newline. Fields of data in each row are delimited with a comma.Example :Â Â Name, Salary, Age, No.of year
2 min read
How to Copy a Table in MySQL Using Python?
In this article, we will create a table in MySQL and will create a copy of that table using Python. We will copy the entire table, including all the columns and the definition of the columns, as well as all rows of data in the table. To connect to MySQL database using python, we need PyMySql module.
3 min read
How to Make Arrays fit into Table in Python Pandas?
To convert arrays into a table (DataFrame) in Python using the Pandas library, you can follow the steps depending on the structure of your array:1. One-Dimensional ArrayTo convert a one-dimensional NumPy array into a DataFrame, use the pd.DataFrame() method and specify column names for better readab
2 min read
How to Show All Tables in MySQL using Python?
A connector is employed when we have to use mysql with other programming languages. The work of mysql-connector is to provide access to MySQL Driver to the required language. Thus, it generates a connection between the programming language and the MySQL Server. In order to make python interact with
1 min read