pandas.read_iceberg#
- pandas.read_iceberg(table_identifier, catalog_name=None, catalog_properties=None, row_filter=None, selected_fields=None, case_sensitive=True, snapshot_id=None, limit=None, scan_properties=None)[source]#
Read an Apache Iceberg table into a pandas DataFrame.
Warning
read_iceberg is experimental and may change without warning.
- Parameters:
- table_identifierstr
Table identifier.
- catalog_namestr, optional
The name of the catalog.
- catalog_propertiesdict of {str: str}, optional
The properties that are used next to the catalog configuration.
- row_filterstr, optional
A string that describes the desired rows.
- selected_fieldstuple of str, optional
A tuple of strings representing the column names to return in the output dataframe.
- case_sensitivebool, default True
If True column matching is case sensitive.
- snapshot_idint, optional
Snapshot ID to time travel to. By default the table will be scanned as of the current snapshot ID.
- limitint, optional
An integer representing the number of rows to return in the scan result. By default all matching rows will be fetched.
- scan_propertiesdict of {str: obj}, optional
Additional Table properties as a dictionary of string key value pairs to use for this scan.
- Returns:
- DataFrame
DataFrame based on the Iceberg table.
See also
read_parquet
Read a Parquet file.
Examples
>>> df = pd.read_iceberg( ... table_identifier="my_table", ... catalog_name="my_catalog", ... catalog_properties={"s3.secret-access-key": "my-secret"}, ... row_filter="trip_distance >= 10.0", ... selected_fields=("VendorID", "tpep_pickup_datetime"), ... )