Description
This will allow pandas objects to be collected on the first generation of the gc rather than wait for it to break cycles. Practically I am not sure this will have much of a user change.
dask/distributed#956 (comment)
The idea is to change this code here
from
class _NDFrameIndexer(object):
_valid_types = None
_exception = KeyError
axis = None
def __init__(self, obj, name):
self.obj = obj
self.ndim = obj.ndim
self.name = name
to
class _NDFrameIndexer(object):
_valid_types = None
_exception = KeyError
axis = None
def __init__(self, obj, name):
self.obj = weakref.ref(obj)
self.ndim = obj.ndim
self.name = name
and corresponding self.obj
to self.obj()
it 'works' in that gc collection happens immedately upon object deletion (IOW del df
). but a few fails on caching / chaining. In particular tests like: https://github.com/pandas-dev/pandas/blob/master/pandas/tests/indexing/test_chaining_and_caching.py#L31 I think were relying upon the reference NOT being collected (so that they can check it).
So this would require some internal reworking to remove / fix this. I suspect we will still achieve the same user effects (meaning of detection of chaining etc).