Status: ztsdb

Status

11 May 2017

Leonardo Silvestri

ztsdb is still alpha. To transition to beta, additional testing and user validation are necessary, as well as some improvements in the areas outlined below.

R interface

ztsdb's time type maps to POSIXct. This is very problematic because time has nanoseconds precision whereas POSIXct is floating point based and cannot achieve nanosecond precision. ztsdb's duration, interval and period types do not yet have a mapping. zts (ztsdb's time-series type) maps to xts with a POSIXct index.

time could map to nanotime from the nanotime package and duration could map to integer64 from the bit64 package, but there is still interval and period to take care of. Also, one issue is that xts does not currently support nanotime as an index. Hopefully it will be possible to add nanotime to the list of valid xts index types. In the shorter term, a fork of xts or using data.table instead are options.

In order to make sensible decisions on this R interface there needs to be more input from ztsdb users. The current interface is sufficient to show the level of integration between R and ztsdb that can be achieved, but it is not production grade.

R functionality

Although ztsdb is a deliberately minimal subset of R and will continue to provide only complementary time-series DBMS functionality, there are certainly some core R functions that are still missing (e.g. rep, abs, grep, functionals, ...).

Performance

Many design choices were made with speed in mind. For example, incoming buffers are directly assembled into the data structure they represent (this is true for both requests and appends) and the internals are designed to eliminate spurious copies (for example in argument passing, list assignment, etc.).

Nonetheless, a rigorous study of performance needs to be made in order to assess areas for improvement. Pending this investigation, these areas are currently potential candidates:

Outgoing message threading. Right now an outgoing temporary that is returned as the result of a query will be processed in the context thread. Sending a temporary could be handled in a separate thread, thereby freeing immediately the context thread to service another query.
Searching for the object to append to. During an append operation a search is done by variable name lookup. There can be multiple lookups depending on the nesting level of an object (i.e. an object might be a member of a list or a sublist). Initial tests show that nested searches add significantly to the processing of an append message. A more complex protocol could provide a numerical identifier to the client and ztsdb could then use this identifier for an immediate lookup.
The align function does a linear search when the method is closest and a dichotomic search otherwise. In many cases this works well, but in some cases it might not. A heuristic algorithm that decides depending on the density of the time vectors might work better.

Durability

ztsdb uses Linux's mmap infrastructure to persist objects to files. From initial testing this seems to work well, but for very large time-series it could make sense to split files in smaller time chunks and ask ztsdb to string the chunks together. This would allow in particular a better backup strategy for very long time series (e.g. financial time-series).

A quick note here to mention that ztsdb's arrays are designed to be able to handle mmap truncation from the beginning of the array, i.e. throwing away old observations. In fact, in-memory time-series already have this ability (see this example), but it's more difficult for file-backed mappings: using fallocate, it would only be possible to achieve this on an XFS filesystem.

2020

2017

2016

R interface

R functionality

Performance

Durability