-
Notifications
You must be signed in to change notification settings - Fork 319
Closed
Labels
api: bigqueryIssues related to the googleapis/python-bigquery API.Issues related to the googleapis/python-bigquery API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.‘Nice-to-have’ improvement, new feature or different behavior or design.
Description
Version 2.4.0 of the library is allocating much more memory that the previous version, 2.3.1, when running multiple queries.
In particular, it seems that the QueryJob
object is retaining the results of the query internally, and that memory is not deallocated.
I think that the problem is related to #374.
Environment details
- macOS 11.0.1 (also observing this on Linux in a production environment)
- Python version: 3.8.6
- pip version: 20.1.1
google-cloud-bigquery
version: 2.4.0
Steps to reproduce
Run the script in the code example with google-cloud-bigquery
2.4.0 and 2.3.1 versions.
You will also need to install:
google-cloud-bigquery-storage==2.1.0
pandas==1.1.4
psutil==5.7.3
The outputs on my machine are:
With 2.4.0:
Initial memory used: 77 MB
Memory used: 642 MB
Memory used: 875 MB
Memory used: 1117 MB
Memory used: 1342 MB
Memory used: 1568 MB
Memory used: 1792 MB
Memory used: 2039 MB
Memory used: 2265 MB
Memory used: 2505 MB
Memory used: 2725 MB
With 2.3.1:
Initial memory used: 77 MB
Memory used: 97 MB
Memory used: 98 MB
Memory used: 99 MB
Memory used: 99 MB
Memory used: 99 MB
Memory used: 99 MB
Memory used: 100 MB
Memory used: 101 MB
Memory used: 101 MB
Memory used: 101 MB
Code example
Please note that we are storing a reference to the QueryJob
objects, but not to the resulting DataFrames.
import os
import psutil
from google.cloud import bigquery
if __name__ == '__main__':
client = bigquery.Client()
process = psutil.Process(os.getpid())
print(f"Initial memory used: {process.memory_info().rss / 1e6:.0f} MB")
jobs = []
for i in range(10):
job = client.query("SELECT x FROM UNNEST(GENERATE_ARRAY(1, 1000000)) AS x")
job.result().to_dataframe()
jobs.append(job)
print(f"Memory used: {process.memory_info().rss / 1e6:.0f} MB")
ralbertazzi, abignoli, giucataldi16, davidepedranz, xelhark and 11 more
Metadata
Metadata
Assignees
Labels
api: bigqueryIssues related to the googleapis/python-bigquery API.Issues related to the googleapis/python-bigquery API.type: feature request‘Nice-to-have’ improvement, new feature or different behavior or design.‘Nice-to-have’ improvement, new feature or different behavior or design.