Thursday, July 14, 2011

Slow InnoDB insert?

InnoDB has different characteristics which myisam doesn't have. Some of them need extra time. Table space management, transaction logging, writing to rollback segment, ...

Generally the insertion speed of InnoDB is slower than MyISAM.

1) Row size
Run "show table status", you will see for the same table, the avg_row_length for are different. MyISAM is 84 bytes, but InnoDB is 116, and could be more depending on how PK is defined. MyISAM uses heap storage, the new inserted records are basically appended to the data file without sorting; whereas InnoDB uses a cluster indexed storeage, i.e., the whole table is structured as a sorted BTree, the data records on the leaf nodes. The inernal nodes is comprised of PK. So the table size is not calculated as rowSize*numOfRows. There is a overhead.

2) Primary Key
Since InnoDB stores data at the leaves of the primary index the primary key should be kept as short as possible. And if the primary key is not a int that just increments linearly each insert of data can't be made at the end of the table but in the middle which forces some data shuffling.

I tried table partitition and MySQL requires the partition columns must be part of PK. So I chose a composite PK (mac, partId, recordTime), partId is an auxiliary colulmn, a hash code (0..9) generated from mac. This is becasue the hash type partition doesn't support UDF (user defined function). Since the PK value of it is in arbitrary order while inserting and causes data shuffling. The insertion of the partitioned table is much slower (5-317 seconds for 50k records) than the one without partition and with an auto-increment long PK (5-30 seconds). And MyISAM is the fastest, 5-10 seconds.

3) InnoDB configuration
The article Cofnigure InnoDB explains this subject very well. These setting affects the InnoDB performance a lot. Among them, innodb_log_file_size, innodb_flush_log_at_trx_commit are important for insertion to reduce disk I/O. Note that when innodb_log_file_size is changed, we need to remove the old log files and restart mysql server.

innodb_data_file_path = ibdata1:15M:autoextend

# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
innodb_buffer_pool_size = 1024M
innodb_additional_mem_pool_size = 16M

# Set .._log_file_size to 25 % of buffer pool size
innodb_log_file_size = 512M
innodb_log_buffer_size = 256M
innodb_flush_log_at_trx_commit = 0
innodb_lock_wait_timeout = 50
innodb_file_per_table

Run "show innodb status" to know the innodb runtime status as explained in SHOW INNODB STATUS walk through.
By monitoring log sequence number and value up to which logs have been flushed you can check if your innodb_log_buffer_size is optimal - if you see more than 30% of log buffer size being unflushed you may want to increase it.

From these values you can learn if your buffer pool is sized well - if you have constantly a lot of pages free, it probably means your active database size is smaller than allocated buffer pool size so you can tune down innodb_buffer_pool_size.

4) JDBC batch size
InnoDB is transaction safe. Turn of autoCommit and use JDBC batch insert should be faster. But surprisingly, there are not performance improvement when I tried the batch insert. It is becasue MySQL in general doesn't have a "batch" form of prepared statement parameter bindings(?). But MySQL support batch in a different way. By adding rewriteBatchedStatements into url (e.g., jdbc:mysql://localhost:4589/mydb?rewriteBatchedStatements=true), it will collapse multiple inserts into one based on the batch size, e.g., "INSERT INTO foo VALUES (...), (...), (...)". The insertion performance is further improved from 10 seconds to 5 seconds, i.e., 5000 records per second.

Hibernate's StatelessSession doesn't repect the hibernate.jdbc.batch_size setting, so couldn't do a batch insert with it. One trick to view the actaul running sql statement is run "show innodb status".

No comments: