Loading large collections
06 Oct 2009, by Chris Norton
7 Comments · Posted in Magento

    Working with large Magento collections in PHP can often cause problems with memory usage and, to a lesser extent, computational overhead. The common method of loading all objects in the collection quickly becomes problematic as collection size increases. For stores with thousands of products or categories, chances are you'll be running into PHP memory limits more often than not. This post explains how to process collections in a less memory intensive way and should be a starting point for working with any large collection.

    It's easy to tell when memory is becoming an issue as you'll see an error along the lines of:

    Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 49 bytes) in /var/www/magento/lib/Zend/Db/Statement/Pdo.php on line 294
    

    Often it is simple enough to increase the value of memory_limit using a .htaccess file or the system PHP configuration. However, as collection sizes grow, eventually you reach a point where this becomes unreasonable and there is a need to reduce the memory being used instead of trying to allocate more. In the following example I will be loading a product collection and processing each of the products in such a way that it should scale to almost any collection size with a reasonable amount of memory usage. Similar code will work for any kind of collection, such as products, categories, orders and so forth.

    First, we'll begin by creating a callback function that will process each product and do something with the information. You can find more information about callbacks in the PHP Manual. For this example I'll simply print out some basic details, along with an extra argument which is passed to the callback function (more on this later).

    function productCallback($args)
    {
        $product = Mage::getModel('catalog/product');
        $product->setData($args['row']);
        echo $args['arg1'] . "\n";
        echo $product->getSku() . ': ' . $product->getName() . "\n";
    }
    

    Note the use of $args['row'] to retrieve the product information. The row key will, as you would expect, contain the data for each row of the collection. In this case, the data maps to a product, so I have loaded it into a product object for easy processing.

    Next, we set up the product collection to be processed. This code should be familiar to anyone who has worked with Magento collection objects before. One thing to keep in mind is that the attributes you select are important - only those that are explicitly selected will be available to use in the callback function.

    $products = Mage::getModel('catalog/product')->getCollection()
                        ->addAttributeToSelect('sku')
                        ->addAttributeToSelect(array('name', 'short_description', 'description', 'price', 'image', 'status', manufacturer', 'url_path'), 'inner');
    

    The final piece of code below is the key to reducing memory usage:

    Mage::getSingleton('core/resource_iterator')->walk($products->getSelect(), array('productCallback'), array('arg1' => '===='));
    

    Essentially what it does is steps through the results returned from the database one by one and sends each row to the callback function defined above. This is in contrast to loading all objects in the collection result all at once. Not loading objects individually saves memory by not loading in all the unneeded data, and also on processing time as there's a lot of logic involved in loading most objects.

    Note that the second two arguments shown above are indirectly mapped to the call_user_func function in PHP. The first array is the callback function to use and the second array represents additional arguments to pass to the callback. I have used the example of a simple callback function that has an extra argument of arg1. If you're using this code inside a class and want to use a class method for the callback instead of a function, you'll need to change the code to be:

    Mage::getSingleton('core/resource_iterator')->walk($products->getSelect(), array(array($this, 'productCallback')), array('arg1' => '===='));
    

    Excessive memory usage is something that we've run into frequently when working with large collections and I'm sure many others have as well. Hopefully the above code serves as a good starting point for developing your own solutions that minimise memory usage and improve performance.

    Comments

    This is a great way to do it, collections can be an absolute nightmare in terms of memory - especially when trying to update a large catalog with a php script. Nice work
    Comment by Joe Robertson - 10 Oct 2009 8:32:02 PM
    Interesting approach, love it :)
    Comment by Branko Ajzele - 22 Oct 2009 11:09:00 AM
    What's the different between this approach and by using getCollection()... ->walk('callbackmethod') ?

    I don't see any big difference in term of memory usage.
    Comment by Gunardi Wu - 13 Mar 2010 8:46:10 PM
    It's more a matter of difference between how the two work, rather than there being a difference in memory usage. As far as I'm aware, using getCollection()... ->walk('callbackmethod') requires callbackmethod to be a method of Mage_Catalog_Model_Product. Using the iterator singleton directly allows you to use arbitrary functions and methods, which is more flexible.
    Comment by Chris Norton - 14 Mar 2010 11:11:07 PM
    If you have an list of entity ids, another option to avoid the memory leak and reuse models is to use
    [code]
    <?php
    $product = Mage::getModel('catalog/product');
    foreach ($ids as $id) {
    $product->getResource()->load($product, $id);
    /* do stuff */
    }
    ?>
    [/code]
    Comment by Vinai - 21 Apr 2010 2:23:00 AM
    I have recently been taking a look at your method of reducing memory allocation in magento by using a callback function when iterating through a product collection, I have managed to get it working fine apart from I am not able to pull any other information from the collection other than sku.
    $product->getName() returns nothing however if you pull $product->getName() before you pass the collection through the walk() and into the callback it works.

    Could you offer an explanation for this? I have made sure 'Name' is on my addAttributeToSelect() list.
    Comment by Chris - 29 Dec 2010 9:42:00 PM
    While this could use even less memory, it requires one SELECT query for each product, the collection approach only needs one query. Additionally collections can figure out which ids to process, which can save an additional query.
    Comment by emislive - 30 Dec 2010 6:19:00 AM
    Comments are closed for this post