Full Persistence

Full Persistence Better Motivation, Different Thinking Sung-Soo Kim's Blog Home Tags Categories Archive Full Persistence Tags algorithms 7 19 January 2014 Version Diagrams Gray means version is read only and blue means version is read-write. Figure 1. Full Persistence Full persistence In fully persistent model, both updates and queries are allowed on any version of the data structure.The construction for partial persistence can be expanded to implement full persistence. This result is also due to [5]. We again assume a pointer machine with p < O(1) incoming pointers per node. We need to take care of a two new problems: Now we need to represent versions in a way that lets us efficiently check which precedes which. Versions form a tree, but traversing it is O(# of versions). Now we have to support writes to any version. This affects how we handle writes: we no longer have the concept of ‘current’ vs. ‘old’ nodes: every node needs to support writes. Version representation The version representation problem is solved by keeping a tree representing the version structure, as well as an efficient representation of this tree in a linearized way. The linearized representation of the tree in the following figure is ba, bb, bc, ec, eb, bd, ed, ea. You can read ba as ‘begin node a’ and ea as ‘end node a’. This representation losslessly encodes the tree, and we can directly answer queries about the tree using that encoded representation. For example we know c is nested within b, since bb < bc and ec < eb. Figure 2. in-order tree traversal This linearized representation can be implemented using an ‘order maintenance’ data structure. For now, it suffices to say an order maintenance data structure supports the following two operations, both in O(1) time. insert an item before or after a specified element. check if item s precedes item t. For example, a linked list supports insertions in O(1), but tests for precedence take O(n). Similarly, a balanced BST supports both operations but in O(log n) time. Deitz and Sleator show an O(1) implementation for both operations in @dietz, which will be covered in lecture 8 . To implement version tree queries such as ‘is version v an ancestor of version w’ we can use two comparison queries bv < bw and ew < ev in O(1). To implement updates like ‘add version v as a child of version w’ we can insert the two elements bv and ev after bw and before ew respectively, also in O(1). Construction and algorithm: The nodes in our data structure will keep the same kinds of additional data per node as they did in the partially persistent case. For each node we store d data entries and p back pointers, but now allow up to 2(d+p+1) modifications. The amount of data d is also a bound on out-pointers per node. Additionally we now also version back-pointers. read(n.field, version): By using the order-maintenance data structure we can pick the most recent ancestor of version from among the entries in the mod log and return that value. write(n.field, value, version): If there is space in node, just add mod entry. else: m = new Node(). Split the contents of node n’s mod log into two parts following the diagram figure 3. Partitioning into subtrees rather than arbitrarily is required. Now node m has some of the mods of the internal tree in Figure 3, and node n retains the ‘older’ half of the updates. from the ‘old’ mod entries in node n, compute the latest values of each field and write them into the data and back pointer section of node m. recursively update all (up to) d + p + (d + p + 1) forward and backward pointers of neighbors. insert new version to our version tree representation. Figure 3. Splitting a tree-shaped version genealogy into two subtrees Analysis: Space – 1 if we do not split or d + p + 2(d + p + 1) = 3d + 3p + 2 when we split a node, both O(1) Time – read(var, version) is implemented in a similar way to the partial case. We use our auxiliary version tree data structure to find the largest ancestor of version from among a list of O(1) elements in O(1). Like with the partial case, writes are cheap when a node has space in its mods log and more expensive when nodes are full. Consider ϕ =-c(# empty slots), then when we split Δϕ=-2c(d+p+1) and when we do not Δϕ=c. Hence, for the worst possible choice of x from the neighbors. When we unfold the recursion once, we find the constants cancel out: c - 2c(d+p+1) + (2p + 2p + 1)c = 0. OPEN: De-amortization of full persistence. OPEN: Is there a matching lower bound for both full and partial persistence? References [1] Gerth Stolting Brodal: Partially Persistent Data Structures of Bounded Degree with Constant Update Time. Nord. J. Comput. 3(3): 238-255 (1996) [2] Erik D. Demaine, John Iacono, Stefan Langerman: Retroactive data structures. SODA 2004: 281-290 [3] Erik D. Demaine, Stefan Langerman, and Eric Price: Confluently Persistent Tries for Efficient Version Control. Algorithmica (2008). [4] Paul F. Dietz, Daniel Dominic Sleator: Two Algorithms for Main