ztsdb is still alpha. To transition to beta, additional testing and user validation are necessary, as well as some improvements in the areas outlined below.
time type maps to
POSIXct. This is very problematic
time has nanoseconds precision whereas
POSIXct is floating
point based and cannot achieve nanosecond precision. ztsdb's
period types do not yet have a
zts (ztsdb's time-series type) maps to
xts with a
time could map to
nanotime from the
duration could map to
integer64 from the
but there is still
period to take care of. Also, one
issue is that
xts does not currently support
nanotime as an
index. Hopefully it will be possible to add
nanotime to the list of
xts index types. In the shorter term, a fork of
xts or using
data.table instead are options.
In order to make sensible decisions on this R interface there needs to be more input from ztsdb users. The current interface is sufficient to show the level of integration between R and ztsdb that can be achieved, but it is not production grade.
Although ztsdb is a deliberately minimal subset of R and will continue
to provide only complementary time-series DBMS functionality, there
are certainly some core R functions that are still missing
grep, functionals, ...).
Many design choices were made with speed in mind. For example, incoming buffers are directly assembled into the data structure they represent (this is true for both requests and appends) and the internals are designed to eliminate spurious copies (for example in argument passing, list assignment, etc.).
Nonetheless, a rigorous study of performance needs to be made in order to assess areas for improvement. Pending this investigation, these areas are currently potential candidates:
Outgoing message threading. Right now an outgoing temporary that is returned as the result of a query will be processed in the context thread. Sending a temporary could be handled in a separate thread, thereby freeing immediately the context thread to service another query.
Searching for the object to append to. During an append operation a search is done by variable name lookup. There can be multiple lookups depending on the nesting level of an object (i.e. an object might be a member of a list or a sublist). Initial tests show that nested searches add significantly to the processing of an append message. A more complex protocol could provide a numerical identifier to the client and ztsdb could then use this identifier for an immediate lookup.
alignfunction does a linear search when the method is
closestand a dichotomic search otherwise. In many cases this works well, but in some cases it might not. A heuristic algorithm that decides depending on the density of the
timevectors might work better.
ztsdb uses Linux's mmap infrastructure to persist objects to files. From initial testing this seems to work well, but for very large time-series it could make sense to split files in smaller time chunks and ask ztsdb to string the chunks together. This would allow in particular a better backup strategy for very long time series (e.g. financial time-series).
A quick note here to mention that ztsdb's arrays are designed to be
able to handle
mmap truncation from the beginning of the array,
i.e. throwing away old observations. In fact, in-memory time-series
already have this ability (see
this example), but it's
more difficult for file-backed mappings: using
it would only be possible to achieve this on an XFS filesystem.