KEY - type of keysVALUE - type of valuespublic class GBPTree<KEY,VALUE> extends Object implements Closeable
PageCache with no caching in between.
Additionally internal and leaf nodes on same level are linked both left and right (sibling pointers),
this to provide correct reading when concurrently modifying
the tree.
Generation is incremented on check-pointing.
Generation awareness allows for recovery from last checkpoint(IOLimiter), provided the same updates
will be replayed onto the index since that point in time.
Changes to tree nodes are made so that stable nodes (i.e. nodes that have survived at least one checkpoint) are immutable w/ regards to keys values and child/sibling pointers. Making a change in a stable node will copy the node to an unstable generation first and then make the change in that unstable version. Further change in that node in the same generation will not require a copy since it's already unstable.
Every pointer to another node (child/sibling pointer) consists of two pointers, one to a stable version and one to a potentially unstable version. A stable -> unstable node copy will have its parent redirect one of its two pointers to the new unstable version, redirecting readers and writers to the new unstable version, while at the same time keeping one pointer to the stable version, in case there's a crash or non-clean shutdown, followed by recovery.
A single writer w/ multiple concurrent readers is supported. Assuming usage adheres to this constraint neither writer nor readers are blocking. Readers are virtually garbage-free.
An reader of GB+Tree is a SeekCursor that returns result as it finds them.
As the cursor move over keys/values, returned results are considered "behind" it
and likewise keys not yet returned "in front of".
Readers will always read latest written changes in front of it but will not see changes that appear behind.
The isolation level is thus read committed.
The tree have no knowledge about transactions and apply updates as isolated units of work one entry at the time.
Therefore, readers can see parts of transactions that are not fully applied yet.
A note on recovery:
GBPTree is designed to be able to handle non-clean shutdown / crash, but needs external help
in order to do so.
Writes happen to the tree and are made durable and
safe on next call to checkpoint(IOLimiter). Writes which happens after the last
checkpoint(IOLimiter) are not safe if there's a close() or JVM crash in between, i.e:
w: write
c: checkpoint
x: crash or close()
TIME |--w--w----w--c--ww--w-c-w--w-ww--w--w---x------|
^------ safe -----^ ^- unsafe --^
The writes that happened before the last checkpoint are durable and safe, but the writes after it are not.
The tree can however get back to a consistent state by replaying all the writes since the last checkpoint
all the way up to the crash (x). Even including writes before the last checkpoint is OK,
important is that at least writes since last checkpoint are included. Note that the order
in which the changes are applied is not important as long as they do not affect the same key. The order of
updates targeting the same key needs to be preserved when replaying as only the last applied update will
be visible.
If failing to replay missing writes, that data will simply be missing from the tree and most likely leave the
database inconsistent.
The reason as to why close() doesn't do a checkpoint is that checkpointing as a whole should
be managed externally, keeping multiple resources in sync w/ regards to checkpoints. This is especially important
since a it is impossible to recognize crashed pointers after a checkpoint.
| Modifier and Type | Class and Description |
|---|---|
static interface |
GBPTree.Monitor
For monitoring
GBPTree. |
| Modifier and Type | Field and Description |
|---|---|
static GBPTree.Monitor |
NO_MONITOR
No-op
GBPTree.Monitor. |
| Constructor and Description |
|---|
GBPTree(org.neo4j.io.pagecache.PageCache pageCache,
File indexFile,
Layout<KEY,VALUE> layout,
int tentativePageSize,
GBPTree.Monitor monitor,
Header.Reader headerReader)
Opens an index
indexFile in the pageCache, creating and initializing it if it doesn't exist. |
| Modifier and Type | Method and Description |
|---|---|
void |
checkpoint(org.neo4j.io.pagecache.IOLimiter ioLimiter)
Performs a
check point, keeping any header information
written in previous check point. |
void |
checkpoint(org.neo4j.io.pagecache.IOLimiter ioLimiter,
Consumer<org.neo4j.io.pagecache.PageCursor> headerWriter)
Checkpoints and flushes any pending changes to storage.
|
void |
close()
Closes this tree and its associated resources.
|
org.neo4j.cursor.RawCursor<Hit<KEY,VALUE>,IOException> |
seek(KEY fromInclusive,
KEY toExclusive)
Seeks hits in this tree, given a key range.
|
String |
toString() |
Writer<KEY,VALUE> |
writer()
Returns a
Writer able to modify the index, i.e. |
public static final GBPTree.Monitor NO_MONITOR
GBPTree.Monitor.public GBPTree(org.neo4j.io.pagecache.PageCache pageCache,
File indexFile,
Layout<KEY,VALUE> layout,
int tentativePageSize,
GBPTree.Monitor monitor,
Header.Reader headerReader)
throws IOException
indexFile in the pageCache, creating and initializing it if it doesn't exist.
If the index doesn't exist it will be created and the Layout and pageSize will
be written in index header.
If the index exists it will be opened and the Layout will be matched with the information
in the header. At the very least Layout.identifier() will be matched.
On start, tree can be in a clean or dirty state. If dirty, it will
clean crashed pointers as part of constructor. Tree is only clean if since last
time it was opened it was closed without any non-checkpointed changes present. Correct usage
pattern of the GBPTree is:
try ( GBPTree tree = new GBPTree(...) )
{
// Use the tree
tree.checkpoint( ... );
}
pageCache - PageCache to use to map index fileindexFile - File containing the actual indexlayout - Layout to use in the tree, this must match the existing layout
we're just opening the indextentativePageSize - page size, i.e. tree node size. Must be less than or equal to that of the page cache.
A pageSize of 0 means to use whatever the page cache has (at creation)monitor - GBPTree.Monitor for monitoring GBPTree.headerReader - reads header data, previously written using checkpoint(IOLimiter, Consumer)
or close()IOException - on page cache errorpublic org.neo4j.cursor.RawCursor<Hit<KEY,VALUE>,IOException> seek(KEY fromInclusive, KEY toExclusive) throws IOException
RawCursor.
There's no guarantee that neither the Hit nor key/value instances are immutable and so
if caller wants to cache the results it's safest to copy the instances, or rather their contents,
into its own result cache.
Seeks can go either forwards or backwards depending on the values of the key arguments.
fromInclusive that is smaller than the toExclusive results in results in ascending order.
fromInclusive that is bigger than the toExclusive results in results in descending order.
fromInclusive - lower bound of the range to seek (inclusive).toExclusive - higher bound of the range to seek (exclusive).RawCursor used to iterate over the hits within the specified key range.IOException - on error reading from index.public void checkpoint(org.neo4j.io.pagecache.IOLimiter ioLimiter,
Consumer<org.neo4j.io.pagecache.PageCursor> headerWriter)
throws IOException
Changes made after this call and until crashing or
otherwise non-clean shutdown (by omitting calling checkpoint before close()) will need to be replayed
next time this tree is opened.
Header writer is expected to leave consumed PageCursor at end of written header for calculation of
header size.
ioLimiter - for controlling I/O usage.headerWriter - hook for writing header data, must leave cursor at end of written header.IOException - on error flushing to storage.public void checkpoint(org.neo4j.io.pagecache.IOLimiter ioLimiter)
throws IOException
check point, keeping any header information
written in previous check point.ioLimiter - for controlling I/O usage.IOException - on error flushing to storage.checkpoint(IOLimiter, Consumer)public void close()
throws IOException
NOTE: No checkpoint is performed.
close in interface Closeableclose in interface AutoCloseableIOException - on error closing resources.public Writer<KEY,VALUE> writer() throws IOException
Writer able to modify the index, i.e. insert and remove keys/values.
After usage the returned writer must be closed, typically by using try-with-resource clause.Writer for this index. The returned writer must be
closed before another caller can acquire this writer.IOException - on error accessing the index.IllegalStateException - for calls made between a successful call to this method and closing the
returned writer.Copyright © 2002–2017 The Neo4j Graph Database Project. All rights reserved.