You may be wondering why your Magento database is so big. It is quite common to have huge Magento database exceeding 1 GB in size. But when its size grows near to 5 GB, you have to start asking your self questions. Does my store has so many products in catalog or there’s something wrong with the database? Let’s find out.
First, let’s find out which Magento database tables are taking most of the space. Connect to your favorite MySQL client (worst case, PhpMyAdmin 🙂 ) and run queries below:
Find largest tables in all databases
SELECT table_schema as "Database", table_name AS "Table",
ROUND(((data_length + index_length) / 1024 / 1024), 2) AS "Size (MB)"
FROM information_schema.TABLES
WHERE (data_length + index_length) > 1000000
ORDER BY table_schema, (data_length + index_length) DESC;
This lists tables from all databases, which exceed 1 MB, sorted by size.
Find the largest tables in a specific database
If you’re interested in a specific database only, you can run:
SELECT table_name AS "Table",
ROUND(((data_length + index_length) / 1024 / 1024), 2) AS "Size (MB)"
FROM information_schema.TABLES
WHERE (data_length + index_length) > 1000000 AND table_schema = "magento"
ORDER BY (data_length + index_length) DESC;
… where magento
is your Magento database name,
You will see a sorted list like the following:
sales_flat_quote_address 2761.75
sales_flat_quote 2400.58
report_viewed_product_index 566.08
sales_flat_order_item 457.58
customer_address_entity_varchar 201.39
customer_entity_varchar 190.44
customer_entity_int 165.33
sales_flat_quote_item_option 116.14
Solutions below are for Magento 1.x only
Huge catalogsearch_query
table
This table stores frequent searches on your website. Truncating is most safe to be done with the following query:
DELETE FROM `catalogsearch_query` WHERE `synonym_for` IS NULL AND `redirect` IS NULL`
This leaves out manually added synonyms and redirects.
Huge core_cache
table
My observation is similar to the following:
Amasty Full Page Cache uses its own Table called core_cache. This database will bloat as visitors visit the website. It is a big flaw with the software. Our database increased from 300Mb up to 8.5GB which resulted in slow database transactions (i.e. add to cart, login, etc), and we have a website with approx 1,000 products.
So it’s best to remove it and use
Huge sales_flat_quote and sales_flat_quote_address tables
As you can see from the output, sales_flat_quote_address and sales_flat_quote each taking up whopping 2+ GB, totalling 5 GB.
Mostly those tables contain abandoned cart data. Now, do we really need to waste 5 GB of space for that?
And worse, if you’re performance oriented like myself, you know that you have to allocate MySQL InnoDB buffer pool that matches your data in size. So do we also need to waste 5 GB RAM for abandoned cart data buffers?
Of course, no. So let’s reduce the size, save some space and gain some performance.
Quick fix
Now there’s a time for simple one time solution:
DELETE FROM sales_flat_quote WHERE updated_at < DATE_SUB(Now(),INTERVAL 60 DAY);
This will remove data for quotes which were not converted within 60 days. No need to run extra query for sales_flat_quote_address
. Since Magento uses databases triggers in its database schemata, the related tables data will be automatically removed from dependent tables.
Long term solution #1
Query fixed the problem for some time. But people are still visiting your store and some of them will surely enough abandon the website. The problem persists. So a better approach would be to regularly cleanup those tables via cron. A plugin comes to the rescue. Meet Aoe Quote Cleaner.
Long term solution #2
A better approach as it does not require use of yet another plugin, is leverage MySQL event schedule.
First you have to enable it in my.cnf
:
[mysqld]
event_scheduler=ON
Do not restart MySQL to apply this change as you can apply it dynamically via:
SET GLOBAL event_scheduler = ON;
Then create an actual schedule to run daily cleanup of records which are older than 60 days:
CREATE EVENT IF NOT EXISTS clean_sales_flat_quote
ON SCHEDULE EVERY 1 DAY
DO
DELETE FROM sales_flat_quote
WHERE updated_at < DATE_SUB(Now(),INTERVAL 60 DAY);
Huge dataflow_batch_export
table
This table is populated with entries when running a data export. If you running exports programmatically, e.g. via cron, make sure to add up cleaning routing to your script.
A sample script that does it can be found in a Gist.
If you are not running an export job right now, but still the table is big, truncating it is safe.
Huge core_session
table (core_session.ibd)
This table is used when you store Magento sessions in MySQL backend. This is the worst place you can store them.
First, it’s only needed for multi-server setup. And second, for multi-server setup you better off using Redis.
To clean up the huge core_session
table after switching to a different backend, you can:
n98-magerun db:query "TRUNCATE TABLE core_session"
Huge cron_schedule
table
This table rarely gets large, but it happens that after disabling of a plugin, its cron tasks are still being scheduled and never cleaned up. Remove the plugin completely and cleanup all but recent scheduled cron tasks:
DELETE FROM cron_schedule WHERE scheduled_at < NOW() - INTERVAL 2 DAY;
Huge report_viewed_product_index table
Do you really need that data?
This table stores data about viewed products: who viewed which product. This is useful for showing blocks like “Recently viewed products”. Another use case is gathering stats about most viewed products. If you don’t have these use cases and not planning to implement them, it is safe to truncate the table daily.
In that way, you will not waste performance for useless data.
While you could have set a daily cron running a TRUNCATE query against the table in question, there’s a better and more elegant approach.
You can make use of Blackhole store engine. It acts like /dev/null for database writes. So all the unnecessary I/O and disk space will be saved:
ALTER TABLE report_viewed_product_index ENGINE=blackhole;
You can also disable log events (untested).
But if you need that data…
However, if you do need that data, there are some more interesting insights about it. Knowing these will allow you to weed out the large chunks of unneeded records:
Typically the report_viewed_product_index
records would have:
visitor_id
=NULL
when it is a customer, andcustomer_id
is notNULL
.customer_id
=NULL
when it is not a logged in customer, andvisitor_id
is notNULL
.
The case where both are NULL is not an easy one.
The vistor_id
is retrieved from the Mage_Log_Model_Visitor
class.
The only possibility is that it is a visitor
that has an excluded user agent, which means that is a bot (Google Bot, etc.).
As you can see from the constructor, the request logging is skipped in that case:
protected function _construct()
{
$this->_init('log/visitor');
$userAgent = $this->_httpHelper->getHttpUserAgent();
$ignoreAgents = $this->_config->getNode('global/ignore_user_agents');
if ($ignoreAgents) {
$ignoreAgents = $ignoreAgents->asArray();
if (in_array($userAgent, $ignoreAgents)) {
$this->_skipRequestLogging = true;
}
}
if ($this->_logCondition->isLogDisabled()) {
$this->_skipRequestLogging = true;
}
}
The ignored user agents are defined under app/code/core/Mage/Log/etc/config.xml
and are the following:
- Googlebot/1.0 (googlebot@googlebot.com http://googlebot.com/)
- Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
- Googlebot/2.1 (+http://www.googlebot.com/bot.html)
Finally, the case where both are not NULL
It happens when:
- logged out customer visits a product page
- leaves the website
- comes back and logs in as a customer
- revisits the same product page
In that case, Magento will keep both values.
The visitor data is later set to NULL
when the customer logs out with the following code:
public function purgeVisitorByCustomer(Mage_Reports_Model_Product_Index_Abstract $object)
{
/**
* Do nothing if customer not logged in
*/
if (!$object->getCustomerId()) {
return $this;
}
$bind = array('visitor_id' => null);
$where = array('customer_id = ?' => (int)$object->getCustomerId());
$this->_getWriteAdapter()->update($this->getMainTable(), $bind, $where);
return $this;
}
What about the daily cron that does log cleaning? Inside app/code/core/Mage/Reports/Model/Resource/Product/Index/Abstract.php you can locate the clean function:
/**
* Clean index (visitor)
*
* @return Mage_Reports_Model_Resource_Product_Index_Abstract
*/
public function clean()
{
while (true) {
$select = $this->_getReadAdapter()->select()
->from(array('main_table' => $this->getMainTable()), array($this->getIdFieldName()))
->joinLeft(
array('visitor_table' => $this->getTable('log/visitor')),
'main_table.visitor_id = visitor_table.visitor_id',
array())
->where('main_table.visitor_id > ?', 0)
->where('visitor_table.visitor_id IS NULL')
->limit(100);
$indexIds = $this->_getReadAdapter()->fetchCol($select);
if (!$indexIds) {
break;
}
$this->_getWriteAdapter()->delete(
$this->getMainTable(),
$this->_getWriteAdapter()->quoteInto($this->getIdFieldName() . ' IN(?)', $indexIds)
);
}
return $this;
}
It:
- Cleans only those records in
report_viewed_product_index
wherevisitor_id
isn’t null and the corresponding entry inlog_visitor
no longer exists - Does not clean any records where
visitor_id
isNULL
(presumably this data is useful because it has to contain positivecustomer_id
). This gives you historical data about what your customers have been looking at before or after their purchase. - It fails to clean the records where both
visitor_id
andcustomer_id
areNULL
. This is what we will take care about, because this is truly a bug that Magento won’t clean this.
Even if you had setup Magento 1.9 cron properly and enabled log cleaning in Magento settings, the table report_viewed_product_index
is not cleaned properly. Let’s take care of this:
@daily /usr/bin/n98-magerun db:query "DELETE FROM report_viewed_product_index WHERE visitor_id IS NULL and customer_id IS NULL" > /dev/null
Finally, if you want to go hardcore and trim and truncate all you want, read this article about all the tables you can truncate without affecting your store function.
To get information about the sizes of log tables in Magento, use:
php -f shell/log.php status
Huge log_visitor_info
table
The table log_visitor_info
may grow too large especially if the Magento setting Save Log, Days
is set too high, resulting in too many entries/data.
This table can also grow as a result of a DoS attack.
Ensure that the setting Save Log, Days
(Configuration -> System
) is set to no more than 90 days.
Subsequently, apply cleaning for reducing existing data after changing the setting’s value:
n98-magerun sys:cron:run log_clean
Magento is by far the most widely used Ecommerce self-hosted platform. But it needs very thorough approach when it comes to performance. One important factor that affects Magento’s performance is its database size. Be sure to check our full ultimate Magento performance checklist for all the tips and tricks on getting Magento to run as fast as possible.
Or save yourself lots of time and let me configure and setup Magento server for you.