Loading large collections
06 Oct 2009, by Chris Norton
7 Comments · Posted in Magento

    Working with large Magento collections in PHP can often cause problems with memory usage and, to a lesser extent, computational overhead. The common method of loading all objects in the collection quickly becomes problematic as collection size increases. For stores with thousands of products or categories, chances are you'll be running into PHP memory limits more often than not. This post explains how to process collections in a less memory intensive way and should be a starting point for working with any large collection.

    It's easy to tell when memory is becoming an issue as you'll see an error along the lines of:

    Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 49 bytes) in /var/www/magento/lib/Zend/Db/Statement/Pdo.php on line 294

    Often it is simple enough to increase the value of memory_limit using a .htaccess file or the system PHP configuration. However, as collection sizes grow, eventually you reach a point where this becomes unreasonable and there is a need to reduce the memory being used instead of trying to allocate more. In the following example I will be loading a product collection and processing each of the products in such a way that it should scale to almost any collection size with a reasonable amount of memory usage. Similar code will work for any kind of collection, such as products, categories, orders and so forth.

    First, we'll begin by creating a callback function that will process each product and do something with the information. You can find more information about callbacks in the PHP Manual. For this example I'll simply print out some basic details, along with an extra argument which is passed to the callback function (more on this later).

    function productCallback($args)
        $product = Mage::getModel('catalog/product');
        echo $args['arg1'] . "\n";
        echo $product->getSku() . ': ' . $product->getName() . "\n";

    Note the use of $args['row'] to retrieve the product information. The row key will, as you would expect, contain the data for each row of the collection. In this case, the data maps to a product, so I have loaded it into a product object for easy processing.

    Next, we set up the product collection to be processed. This code should be familiar to anyone who has worked with Magento collection objects before. One thing to keep in mind is that the attributes you select are important - only those that are explicitly selected will be available to use in the callback function.

    $products = Mage::getModel('catalog/product')->getCollection()
                        ->addAttributeToSelect(array('name', 'short_description', 'description', 'price', 'image', 'status', manufacturer', 'url_path'), 'inner');

    The final piece of code below is the key to reducing memory usage:

    Mage::getSingleton('core/resource_iterator')->walk($products->getSelect(), array('productCallback'), array('arg1' => '===='));

    Essentially what it does is steps through the results returned from the database one by one and sends each row to the callback function defined above. This is in contrast to loading all objects in the collection result all at once. Not loading objects individually saves memory by not loading in all the unneeded data, and also on processing time as there's a lot of logic involved in loading most objects.

    Note that the second two arguments shown above are indirectly mapped to the call_user_func function in PHP. The first array is the callback function to use and the second array represents additional arguments to pass to the callback. I have used the example of a simple callback function that has an extra argument of arg1. If you're using this code inside a class and want to use a class method for the callback instead of a function, you'll need to change the code to be:

    Mage::getSingleton('core/resource_iterator')->walk($products->getSelect(), array(array($this, 'productCallback')), array('arg1' => '===='));

    Excessive memory usage is something that we've run into frequently when working with large collections and I'm sure many others have as well. Hopefully the above code serves as a good starting point for developing your own solutions that minimise memory usage and improve performance.