Tuesday, May 20, 2008

MySql innodb transaction/logs



- MySql's innodb follows Oracle's multi-version read consistency model. It's multi version is based on undo log. In Innodb, each table has two hidden columns, a transaction number and a pointer to the previous version of that row in the undo log. With the rollback pointer, READ COMMITTED can go back one step to the past, whereas REPEATABLE READ can go back multiple steps until it reaches the newest version after its transaction starts. Based on the transaction number, we know particular row in data buffer is associated with committed or uncommitted transaction.

- Write-Ahead logging (WAL) principal: all modifications are written to a log before they are applied to the database. It is to guarantee the Atomicity and Durability of ACID. WAL significantly reduces the number of disk writes, since only the log file needs to be flushed to disk at the time of transaction commit. In multi-user environments, commits of many transactions may be accomplished with a single fsync() of the log file. The log file is written sequentially, so the cost of syncing the log is much less than the cost of flushing the data pages. Writing back change to table space involves slow disk seek rather than liner write. Another benefit of WAL is consistency of the data pages.

- Two buffers: log buffer and data buffer in InnoDb. Log buffer is for insert/update, whereas data buffer is for all operations, it used to cache data and indexes of the tables. Also there are some other memory buffer such as sort buffer and query cache buffer. When a transaction commit, the rows in log buffer are flushed to disk, whereas the rows in data buffer are marked as dirty but not flushed. When innodb_flush_log_at_trx_time set to 0, even the log buffer is not flushed (not recommended). There are several threads monitoring buffer activity, and three situations -- overflow, checkpoint, and commit -- that result in disk writes.

- InnoDB uses both row level locking and multi-versioning concurrency control (MVCC). Multi-version Consistent read is the default mode in which InnoDB processes SELECT statements in READ COMMITTED and REPEATABLE READ isolation levels. FOR UPDATE or IN SHARE MODE apply exclusive (X) lock and shared (S) lock.

- Checkpoint: InnoDB takes some pro-active measures against overflows, and the most important of these measures is checkpointing. There is a separate thread, or a combination of threads that are separate from the thread that changes the buffers. At fixed intervals the checkpointer will wake, look for buffer changes, and ensure that writes happen.

See:
Transactions - An InnoDB tutorial
How Logs work on MySQL with InnoDB tables

No comments: