PyPy uses a totally different approach to object layout than CPython which saves a huge amount of memory (25% - 50%) on the overhead of objects when there are many objects with the same layout (same attributes).
I haven't seen any information about whether there is a slight performance penalty for the added indirection, although I believe the PyPy JIT is able to optimize it away. I'm also not sure if it contributes to making native code extensions more difficult. It's a big enough savings to warrant a serious consideration, though.
It seems like there's at least one consequence of this choice for PyPy, though minor:
sys.getsizeof() always raises TypeError. This is because a memory profiler using this function is most likely to give results inconsistent with reality on PyPy. It would be possible to have sys.getsizeof() return a number (with enough work), but that may or may not represent how much memory the object uses. It doesn’t even make really sense to ask how much one object uses, in isolation with the rest of the system. For example, instances have maps, which are often shared across many instances; in this case the maps would probably be ignored by an implementation of sys.getsizeof(), but their overhead is important in some cases if they are many instances with unique maps. Conversely, equal strings may share their internal string data even if they are different objects—or empty containers may share parts of their internals as long as they are empty. Even stranger, some lists create objects as you read them; if you try to estimate the size in memory of range(10**6) as the sum of all items’ size, that operation will by itself create one million integer objects that never existed in the first place. Note that some of these concerns also exist on CPython, just less so. For this reason we explicitly don’t implement sys.getsizeof().
PyPy uses a totally different approach to object layout than CPython which saves a huge amount of memory (25% - 50%) on the overhead of objects when there are many objects with the same layout (same attributes).
I haven't seen any information about whether there is a slight performance penalty for the added indirection, although I believe the PyPy JIT is able to optimize it away. I'm also not sure if it contributes to making native code extensions more difficult. It's a big enough savings to warrant a serious consideration, though.
Details here:
https://dev.nextthought.com/blog/2018/08/cpython-vs-pypy-memory-usage.html
https://morepypy.blogspot.com/2010/11/efficiently-implementing-python-objects.html#using-maps-for-memory-efficient-instances
The text was updated successfully, but these errors were encountered: