Serial and Serializable Schedules⌗

Serial schedule: no mixing of actions from different transactions.

Serializable schedule: if there is a serial schedule S’ such that for every initial database state, the effects of S and S’ are the same.

The details of the transactions do matter, but it is not realistic for the scheduler to concern itself with the details of computation undertaken by transactions.

We only care about the reads $r_T(X)$ and writes $w_T(X)$ performed by the transaction $T$.

Conflict-Serializability⌗

Actions that conflict (i.e. their order may not be swapped):

1. Two actions of the same transaction.
2. Two writes of the same database element by different transactions.
3. A read and a write of the same database element by different transactions.

In other words, for actions from different transactions, they conflict when:

1. They involve the same database element, and
2. At least one is a write.

Conflict-equivalent: two schedules can be turned one into the other by a sequence of nonconflicting swaps of adjacent actions.

Conflict-serializable: the schedule is conflict-equivalent to a serial schedule

conflict-serializability $\Rightarrow$ serializability

Precedence: if there are actions $A_1$ of $T_1$ and $A_2$ of $T_2$, such that:

• $A_1$ is ahead of $A_2$ in the schedule $S$,
• Both $A_1$ and $A_2$ involve the same database element, and
• At least one of $A_1$ and $A_2$ is a write action.

then we say that $T_1$ takes precedence over $T_2$ in $S$ ($T_1<_ST_2$).

Any conflict-equivalent serial schedule must have $T_1$ before $T_2$.

If the precedence graph is acyclic, then the schedule is conflict-equivalent.

Proof: for $n$, arrange the schedule by (Actions of $T_i$)(Actions of the other $n-1$ transactions) where $T_i$ is one node without an in arc.

Enforcing Serializability by Locks⌗

The use of lock:

• Consistency of Transactions: Actions and locks must relate in the expected ways:
1. A transaction can only read or write an element if it previously was granted a lock on that element and hasn’t yet released the lock.
2. If a transaction locks an element, it must later unlock that element.
• Legality of Schedules: Locks must have their intended meaning: no two transactions may have locked the same element without one having first released the lock.

The two-phase locking condition than can ensure conflict-serializability:

• In every transaction, all lock actions precede all unlock actions.

A transaction that obeys the 2PL condition is said to be a two-phase-locked transaction, or 2PL transaction.

Proof: for $n$, arrange the schedule by (Actions of $T_i$)(Actions of the other $n-1$ transactions) where $T_i$ is the transaction with the first unlock action $u_iX$.

Locking Systems With Several Lock Modes⌗

Lock modes:

• shared
• exclusive

The compatibility matrix:

Request SRequest X
Currently SYesNo
Currently XNoNo

Avoid the problem with a third lock mode: update locks.

• An update lock $ul_i(X)$ gives transaction $T_j$ only the privilege to read X, not to write X;
• Only the update lock can be upgraded to a write lock later; a read lock cannot be upgraded.

The compatibility matrix:

Request SRequest XRequest U
Currently SYesNoYes
Currently XNoNoNo
Currently UNoNoNo

The columns for U and S locks are the same, and the rows for U and X locks are the same.

And now:

An Architecture for a Locking Scheduler⌗

The architecture discussed here: the scheduler insert and release locks.

1. Part 1: inserts appropriate lock actions ahead of all database-access operations
2. Part 2: for received action,
1. if the issuing transcation is already delayed, then delay this action
2. otherwise,
1. if the action is db access (the required locks are already been granted), execute it
2. if is lock action
1. if the lock can be granted: modify the lock table
2. otherwise, request the lock and delay
3. When a transcation commits or aborts, Part 1 releases all locks held by the transaction; if any transactions are waiting for any of these locks, notify Part 2
4. Part 2 checks which actions can be executed now

The lock table entry:

1. The group mode: the most stringent conditions
2. The waiting bit: at least one transaction is waiting
3. A list of transactions that hold or waiting for the lock

Hierarchies of Database Elements⌗

Warning locks: useful for hierarchical structure.

ISIXSX
ISYesYesYesNo
IXYesYesNoNo
SYesNoYesNo
XNoNoNoNo

The phantom tuple: one that should have been locked but wasn’t, because it didn’t exist at the time the locks were taken.

Solution: acquire an X lock on the entire relation.

The Tree Protocol⌗

The problem: only one transaction that is not read-only can access the B-tree at any time, because the root node must be locked (at least with an update lock).

Solution: as soon as a transaction moves to a child of the root and observes the (quite usual) situation that rules out a rewrite of the root, we would like to release the lock on the root; but it violates 2PL, so a new protocol is needed.

The tree protocol:

1. The schedule is legal (i.e. does not violate lock schemes).
2. A transaction’s first lock may be at any node of the tree.
3. Subsequent locks may only be acquired if the transaction currently has a lock on the parent node.
4. Nodes may be unlocked at any time.
5. A transaction may not relock a node on which it has released a lock, even if it still holds a lock on the node’s parent.

Conclusion: the precedence graph must always be acyclic if the tree protocol is obeyed. Proof:

• If two transactions lock several elements in common, then they are all locked in the same order.
• If $T_i$ locks the root before $T_j$, then $T_i$ locks every node in common with $T_j$ before $T_j$ does.

Concurrency Control by Timestamps⌗

Two approaches to generate timestamp:

1. use the system clock
2. the scheduler maintaining a counter

For the database element $X$:

1. $RT(X)$: the read time of $X$
2. $WT(X)$: the write time of $X$
3. $C(X)$: the commit bit fot $X$, true when the most recent transaction to write $X$ has already commited.

Two kinds of unrealizable behaviours:

The problem with dirty data:

$T$ better delays until $U$ commits.

The commit bit will be false.

The Thomas write rule: the writes can be skipped when a write with a later write-time is already in place (read intention for value written by T is too-late, as the value is written by a logically latter transaction).

The scheduler, in response to a read or write request from a transaction T has the choice of:

• Granting the request,
• Aborting T and restarting T with a new timestamp
• Delaying T and later deciding whether to abort or grant the request.

The rules for timestamp-based scheduling:

1. The scheduler receives a request $r_T(X)$.
1. $TS(T)\ge WT(X)$: physically realizable.
1. $C(X)$ is true: grant the request; set $RT(X)$ if $TS(T)>RT(X)$.
2. $C(X)$ is false: delay until becomes true, or that transaction aborts (to avoid dirty reads).
2. $TS(T)<WT(X)$: physically unrealizable, rollback $T$.
2. The scheduler receives a request $w_T(X)$.
1. $TS(T) \ge RT(X)$ and $TS(T) > WT(X)$: physically realizable and must be performed (?).
1. Write $X$,
2. Set $WT(X) := TS(T)$, and
3. Set $C(X) := \texttt{false}$.
2. $TS(T) \ge RT(X)$ but $TS(T) < WT(X)$: physically realizable, and
1. If $C(X)$ is true, ignore the write by T (the thomas write rule);
2. otherwise, delay T until the previous write operation commits or aborts.
3. $TS(T) < RT(X)$: physically unrealizable, roll back T.
3. The scheduler receives a request to commit T: find all elements written by T, and set $C(X) := \texttt{true}$; the delayed transactions are allowed to proceed.
4. The scheduler receives a request to abort T, or to rollback: check all delaying transcation waiting for any element that T wrote to see if they can proceed.

The use of multiversion timestamps: to provide an appropriate version for a read action (the version of X written immediately before T theoretically executed), even if the latest version is written after the transcation.

Comparision of locking and timestamp schemes:

Generally, timestamping is superior in situations where either most transactions are read-only, or it is rare that concurrent transactions will try to read and write the same element. In high-conflict situations, locking performs better.

• Locking will frequently delay transactions as they wait for locks.
• If concurrent transactions frequently read and write elements in common, then rollbacks will be frequent in a timestamp scheduler.

Concurrency Control by Validation⌗

Validation differs from timestamping principally in that the scheduler maintains a record of what active transactions are doing, rather than keeping read and write times for all database elements.

Denote the write set and read set as $WS(T)$ and $RS(T)$. Transactions are executed in three phases:

1. Read. The transaction reads from the database all the elements in its read set, and also computes (locally) the results it is going to write.
2. Validate. The transaction compares its read and write sets with those of other transactions; rollback if fails.
3. Write. The transaction writes to the database the values for the elements in its write set.

Thus, the validation-based scheduler has an assumed serial order of the transactions to work with, and it bases its decision to validate or not on whether the transactions’ behaviors are consistent with this serial order.

The scheduler maintains three sets:

1. START: the set of transactions that have started, but not yet completed validation.
2. VAL: the set of transactions that have been validated but not yet finished the writing in phase 3.
3. FIN: the set of transactions that have completed phase 3.

Some examples:

A later transaction might read a value before an earlier transaction writes the value.

A later transaction might write before a validated, earlier transaction, as the write of the later one might occur earlier.

Summarization: to validate a transaction T:

• Check that $RS(T) \cap WS(U) = \emptyset$ for any previously validated U that did not finish before T started, i.e., if $FIN(U) > START(T)$.
• Check that $WS(T) \cap WS(U) = \emptyset$ for any previously validated U that did not finish before T validated, i.e., if $FIN(U) > VAL(T)$.

Comparison of Three Concurrency-Control Mechanisms⌗

Storage: all schemes (can be optimized to) use storage space proportional to the sum over all active transactions of the number of database elements the transaction accesses.

Performace:

• Locking delays transactions but avoids rollbacks, even when interaction is high.
• Timestamps and validation do not delay transactions, but can cause them to rollback.
• If interference is low, then neither timestamps nor validation will cause many rollbacks, and may be preferable to locking because they generally have lower overhead than a locking scheduler.
• When a rollback is necessary, timestamps catch some problems earlier than validation, which always lets a transaction do all its internal work before considering whether the transaction must rollback.