ztsdb is still alpha. To transition to beta, additional testing and user validation are necessary, as well as some improvements in the areas outlined below.
R interface
ztsdb's time
type maps to POSIXct
. This is very problematic
because time
has nanoseconds precision whereas POSIXct
is floating
point based and cannot achieve nanosecond precision. ztsdb's
duration
, interval
and period
types do not yet have a
mapping. zts
(ztsdb's time-series type) maps to xts
with a
POSIXct
index.
time
could map to nanotime
from the
nanotime package
and duration
could map to integer64
from the
bit64 package,
but there is still interval
and period
to take care of. Also, one
issue is that xts
does not currently support nanotime
as an
index. Hopefully it will be possible to add nanotime
to the list of
valid xts
index types. In the shorter term, a fork of xts
or using
data.table
instead are options.
In order to make sensible decisions on this R interface there needs to be more input from ztsdb users. The current interface is sufficient to show the level of integration between R and ztsdb that can be achieved, but it is not production grade.
R functionality
Although ztsdb is a deliberately minimal subset of R and will continue
to provide only complementary time-series DBMS functionality, there
are certainly some core R functions that are still missing
(e.g. rep
, abs
, grep
, functionals, ...).
Performance
Many design choices were made with speed in mind. For example, incoming buffers are directly assembled into the data structure they represent (this is true for both requests and appends) and the internals are designed to eliminate spurious copies (for example in argument passing, list assignment, etc.).
Nonetheless, a rigorous study of performance needs to be made in order to assess areas for improvement. Pending this investigation, these areas are currently potential candidates:
Outgoing message threading. Right now an outgoing temporary that is returned as the result of a query will be processed in the context thread. Sending a temporary could be handled in a separate thread, thereby freeing immediately the context thread to service another query.
Searching for the object to append to. During an append operation a search is done by variable name lookup. There can be multiple lookups depending on the nesting level of an object (i.e. an object might be a member of a list or a sublist). Initial tests show that nested searches add significantly to the processing of an append message. A more complex protocol could provide a numerical identifier to the client and ztsdb could then use this identifier for an immediate lookup.
The
align
function does a linear search when the method isclosest
and a dichotomic search otherwise. In many cases this works well, but in some cases it might not. A heuristic algorithm that decides depending on the density of thetime
vectors might work better.
Durability
ztsdb uses Linux's mmap infrastructure to persist objects to files. From initial testing this seems to work well, but for very large time-series it could make sense to split files in smaller time chunks and ask ztsdb to string the chunks together. This would allow in particular a better backup strategy for very long time series (e.g. financial time-series).
A quick note here to mention that ztsdb's arrays are designed to be
able to handle mmap
truncation from the beginning of the array,
i.e. throwing away old observations. In fact, in-memory time-series
already have this ability (see
this example), but it's
more difficult for file-backed mappings: using
fallocate,
it would only be possible to achieve this on an XFS filesystem.