Thursday, 18 June, 2020
Development Diary #4
Welcome to a new instalment of our development journey…
XTDB is a database that prioritises flexibility above all else:
-
flexibility for system architects to mix & match storage technologies that align with the requirements, expertise and budget
-
flexibility for data architects to capture and accommodate evolving business domains after-the-fact
-
flexibility for developers to efficiently combine their code with point-in-time graph queries, embeddable directly within a JVM application
However, our most crucial measure of flexibility is the core development team’s ability to evolve XTDB itself and support increasingly complex requirements without sacrificing internal code simplicity or dramatically altering the underlying designs. To this end we have been very busy since the site:/blog/dev-diary-jan-20/[previous diary entry].
The fully-remote XTDB team has been humming along unencumbered these past few months, and consequently, since the 1.6 release there have been 6 new releases including many useful user-facing changes:
Yesterday we released 1.9.0 which is a meaningful milestone for feature and API stability. We have implemented a variety of internal refactorings during this period that have provided substantial performance and storage improvements to the indexes and query engine. For certain use-cases where documents are frequently modified, we have reduced our overall disk space used for indexes by between 45-60% against 1.8.4. Our optimisations in the query engine have resulted in a 25-30% improvement over a reasonable subset of the WatDiv query suite in our nightly benchmarks.
The 1.9 release also introduces many new features, most notably including Transaction Functions which have unlocked a whole new layer of architectural flexibility for users. See the detailed release notes for the full details of 1.9, but let’s look at these significant new features from the perspective of the original requirements.
Transaction Functions
:: Requirement: Users can express changes to the database with more advanced control and granularity than using basic document-oriented transaction operations
Until now, all data has been submitted to XTDB using native put
operations which operate using entire documents. Therefore you cannot put
half a document and equally you cannot delete
a single attribute from a document. However, it is a common expectation and requirement to be able to update
one or more documents based on the current values contained within the previous versions of those documents.
The classic example is maintaining a simple monotonically incrementing counter. The existing options for creating such a counter with XTDB have been:
-
Optimistically
put
a document with the new count value, e.g.{:xt/id :my-counter :count <n + 1>}
, whilst also using amatch
operation to ensure that the current document version of:my-counter
still looks as expected, e.g.{:xt/id :my-counter :count <n>}
(this is necessary to maintain consistency and prevent race-conditions between concurrent transactions) -
Funnel all transaction submissions through a single "gatekeeper" node and use something like Zookeeper to handle failover
The first option is less than ideal, since it requires excess data and churn on the transaction log, particularly where multiple nodes may be writing transactions to update the counter at the same time. The risks of contention and reduced throughput are a major downside, as well as the complexity in the user code needed to implement backoff & retry logic.
The second option is still a reasonable choice but generally runs against the grain of the XTDB philosophy, because centralising authority at a single node for writing to the transaction log undermines the scalability benefits of using clustered technology like Kafka for highly-available and durable write throughput.
Our solution to this problem broadly eliminates the need for any kinds of gatekeeper nodes and has unlocked a whole new layer of possibility in terms of transactional architecture and data modelling. We have designed a feature for expressing custom transaction operations inside of your transactions. These user-supplied function operations are called "Transaction Functions" and they are invoked deterministically during the initial indexing of the transaction log.
The use of transaction functions also simplifies the contents of the transaction log by more explicitly capturing the intent of the operations.
(xt/submit-tx node [[::xt/fn :increment-counter :my-counter]])
:: Requirement: Users can use transaction functions to conditionally update the database atomically based on custom logic with query access to the current database via a provided context
Each transaction function is installed via a put
operation and all invocation arguments are stored separately in the document store. Once invoked as an operation, a transaction function has access to a context against which you can run a query, and this is how you can update a counter based on its current value. The result of invoking a transaction function is a list of one or more operations which are spliced into the transaction to replace the calling operation.
[[::xt/put {:xt/id :increment-counter
:xt/fn '(fn [ctx eid]
(let [db (xtdb.api/db ctx)
entity (xtdb.api/entity db eid)]
[[::xt/put (update entity :count inc)]]))}]]
Nodes which are subsequently indexing the transaction log will not have to repeat this processing of the transaction function operations because the argument documents (to which the transaction log refers under-the-hood) are idempotently mutated and replaced with the resulting native operations. In other words, each transaction function invocation replaces itself with its result in the upstream document store, and this maintains consistency whilst not precluding later eviction operations on the data generated within the results.
Note that we also have a speculative transaction capability for transaction functions which we are working on right now (coming very soon!):
:: Requirement: Users can express constraints and invariants with Datalog inside of transaction functions
Some keen-eyed users will have already spotted that we previously implemented a variation of the transaction function feature and kept it hidden behind a feature flag. We decided to keep that version disabled by default because the operational design was largely incompatible with the use of eviction, where the combination of the two features could too easily lead to inconsistency, and so it was only appropriate for usage by a very narrow set of users. The new implementation conveniently avoids those problems by replacing the argument documents with the resulting operations.
Collections within queries
:: Requirement: Users can express complex queries more succinctly, e.g. where set literals can be used in a
or v
positions, and predicates can return sets
Queries that would previously require many additional clauses can now be compactly expressed thanks to treating collection literals and predicate return values as sets.
;; for example, with these documents submitted:
{:xt/id :hobbits, :members #{:frodo :sam :merry :pippin}}
{:xt/id :three-hunters, :members #{:aragorn :legolas :gimli}}
(xt/q db '{:find [?group], :where [[?group :members #{:frodo :aragorn}]]})
;; => #{[:hobbits] [:three-hunters]}
(xt/q db '{:find [?group], :where [[?group :members ?member]
[(vector :frodo :aragorn) ?member]]})
;; => #{[:hobbits] [:three-hunters]}
(xt/q db '{:find [?member], :where [[#{:hobbits :three-hunters} :members ?member]]})
;; => #{[:frodo] [:sam] [:merry] [:pippin] [:aragorn] [:legolas] [:gimli]}
HTTP Server Security
:: Requirement: Users can configure their XTDB HTTP topology to use JWT security
The HTTP server module and remote Clojure API client can now be configured to use JWT security. This allows for integration with authentication systems such as AWS Cognito to protect write and read access to a node.
:: Requirement: Users can configure their XTDB HTTP topology to disable the submit-tx end point such that a given node is read-only
Various use-cases can benefit from read-only access to an XTDB node, allowing users to freely share access to data stored within XTDB without having to introduce reverse proxies or solve the authorisation problem at a lower level, such as using Kafka ACLs.
Deduplicated indexing of entity history
:: Requirement: Users can efficiently store multiple document versions against a single entity without incurring a storage penalty for identical attribute-value combinations
Even though storage is cheaper than ever, eliminating excessive storage usage is always a good idea. In the course of analysing disk usage more generally we identified that there are common scenarios where data across document versions only changes partially between historic and new versions. This is most clearly visible when modelling time-series data as a history of documents, where many attribute-value combinations are likely to be repeated. With the revamped index layout we have now considerably improved the performance seen in our nightly time-series benchmark tests.
Also in 1.9:
-
Built-in HTML UI for browsing through XTDB data directly on the HTTP server
-
Module stability classifications
-
Removal of previously-deprecated APIs
-
For the full breakdown see the release notes
In the community
XTDB received some unexpected interest on Hacker News, with some good discussion on the nature of document databases, event sourcing and schema-on-write.
The Findka.com personalised recommendations platform is using XTDB within its open-source, Firebase-like Clojure stack called Biff.
Elsewhere, we’ve spoken to teams writing transaction log and document store backends for Google Cloud Datastore, teams integrating XTDB with Lucene, and teams integrating XTDB with various Distributed Ledger Technologies. As ever, we really appreciate hearing about all the interesting things people are working on, so please keep us posted!
Exciting things ahead
-
XTDB Live - sign-up to our newsletter on juxt.pro (see the footer) or keep an eye on our social channels for news about XTDB’s first virtual mini-conference event
-
Speculative Transactions - as described in the Transaction Function section above
-
SQL Queries - powered by Apache Calcite, for ad-hoc queries that compile straight to XTDB Datalog
-
JSON APIs - not everyone speaks edn yet
-
Scalability Benchmarks - a.k.a. let’s not look at next month’s AWS bill
Have a nice day!