Running ztsdb
Starting ztsdb
ztsdb can be started on the command line and by default will provide a REPL. It does not provide command line editing/history but this can be achieved with the readily available rlwrap utility.
Process options
Here are the options as obtained with the --help option. Note that there is a server mode and an eval mode. The eval mode runs the expression on the command line and then exits; it is useful for example to run some code for verification (this is the mode used to run ztsdb's R unit tests).
Some of the configuration options can be overridden on the command line. The command line options override the configuration file, but there is otherwise no differences.
Usage: ztsdb [-h|--help] [-V|--version] [-lENUM|--log.level=ENUM]
[-Cdirname|--config.path=dirname] [-iztsdb code|--init.code=ztsdb
code]
or : ztsdb -eexpression|--expression=expression
or : ztsdb [-aSTRING|--address=STRING] [-pINT|--port=INT]
-h, --help Print help and exit
-V, --version Print version and exit
-l, --log.level=ENUM log level (possible values="TRACE",
"DEBUG", "INFO", "ERROR")
-C, --config.path=dirname config file path
-i, --init.code=ztsdb code initial code
Mode: server
-a, --address=STRING address
-p, --port=INT listen port
Mode: eval
-e, --expression=expression evaluate expression and exit (mandatory)
Config file
The following options are configurable; here they are shown commented out with their default values.
# address=""
# port=0
# timezone.path="/usr/share/zoneinfo"
# logfile.path="/tmp"
# logfile.name="ztsdb.log"
# log.level=INFO
# init.code=""
# prompt="> "
# timezone="UTC"
# digits=7
# scipen=0
# width=100
# max.print=99999
# expressions=10000
# data.q.size=100000
# sig.q.size=100000
# commbuf.ttl.secs=60
# in.req.ttl.secs=180
# in.rsp.ttl.secs=180
Configuration options can be modified when the process is running using the options function, except for address, port and timezone.path, and this is done like in R:
options() ## prints the value of all options
options()$digits ## prints the value of option 'digits'
options(digits=6) ## change the value of option 'digits' to 6
Main concepts
REPL
ztsdb
When started in server mode (and connected to a terminal) the ztsdb executable has a read–eval–print loop (REPL). This means that a ztsdb instance can be used just as a shell client.
R
An alternative to the ztsdb REPL is to use R's REPL. The difference of course is that there is no local instance, so any code must be to the right of a connection and the ? query operator. Since some types are not transmissible via TCP (builtin, _function, timer, connection) the R REPL will not print these types directly (but a string can be obtained using the str function).
Communication
All communication with a ztsdb instance is via TCP. A connection needs to be created with an IP address and a port. This is true both for the R and the ztsdb REPL, and here again, the code is exactly the same on both.
ztsdb has incoming queues of configurable length. These queues should be dimensioned in order to be able to buffer incoming updates and queries (define the terms somewhere) during the interpretation of a query. On the data path the maximum number of buffers is controlled by data.q.size, whereas on the signalling path the maximum number of buffers is controlled by sig.q.size (signalling messages are TCP connection messages, i.e. up or down).
Interpretation contexts
Each connection object establishes a new interpretation context on the remote side, even if the address and port are the same. When a connection goes down the application context is torn down with is and all variables that were defined in the associated environment are destroyed. See the next section for more about interpretation contexts.
Isolation
A connection, a timer and the local REPL always have a one to one association with an interpretation context (IC). An IC is composed of an environment (ztsdb has a hierarchical concept of environments that is similar to R) and also maintains the state of both outgoing and incoming queries. Variables created with the single assign operator <- are always created in the local environment of the IC and therefore can never be accessed from another IC. On the other hand, variables created by the double assign operator <<- are created in the global environment and are therefore accessible from any IC. This means that any data that is meant to be shared between multiple users must be created in the global environment.
The interpreter is single threaded, so two queries can never execute at the same time. Nonetheless, whenever some code attempts to access the result of a query, a future exception is raised and it is suspended until it receives the response or the request times out. After suspension of an IC, another incoming query to another IC will be interpreted.
Durability
Any array or time-series is persisted to disk if the file argument is specified. It must not be an already existing directory.
Note that on restart, ztsdb does not automatically load any data. This is done either via the init.code option which can also be specified as a command line argument. If the loading of data and the desired state (e.g. timers, in-memory array/time-series) is complex, then the usual init-code will consist in the invocation of the source buit-in function. The source file may contain code of arbitrary complexity to get the database up and running in the desired set. A simple demo of this can be found here
Copying of an object that is persisted to disk is not allowed and a
persistent object is always shown as locked when using the str
function:
a <- matrix(1:9, 3, 3, file="/tmp/9632346557")
str(a)
## double - ord [1:3, 1:3] 1 2 3 4 5 6 7 8 9
## - mmap file = /tmp/9632346557, locked
Locking
In addition to the automated locking of objects persisted to disk, it
is possible to mark as locked in-memory objects too. This allows to
make sure a large object is not unwittingly copied. The functions to
lock/unlock are unsurprisingly lock
and unlock
. If the object to
lock/unlock passed as parameter is passed by reference, then it is
locked/unlocked; if it is passed by value then the function returns a
locked/unlocked copy of the object. Finally, an object's lock status
can be tested with the function is.locked
:
a <- matrix(1:9, 3, 3)
is.locked(a)
## [1] FALSE
lock(--a)
is.locked(a)
## [1] TRUE
unlock(--a)
is.locked(a)
## [1] FALSE
Pass by reference
In a DBMS with persistent structures a copy is at best undesirable and
often impossible. For this reason ztsdb adds an explicit pass by
reference operator --
for function arguments. Here are a couple of
examples:
Arguments may be passed by reference to a user function:
f <- function(x) x[1] <- 0 a <- 1 f(a) a ## 'a' is 1 f(--a) a ## 'a' is 0
Arguments may also be passed by reference to built-in functions. In the following example a memory mapped array is created; using pass by reference for the
rbind
operation then guarantees that the data will be added in place (i.e. on disk) to the memory mapped structure:a <- matrix(1:9, 3, 3, file="/tmp/test_pass_by_reference") rbind(--a, 10:12) a ## produces the following result: ## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9 ## [4,] 10 11 12
Bulk insert
Bulk insert can be done with the read.csv function. See CSV read/write.
Bulk insert can also be done via an R xts time series. In an R session, let x be a variable of xts type and c1 a variable of type connection, then it is possible to transfer the times series x to the ztsdb instance connected to c1 either by creating a new remote time-series or by appending to an existing remote time-series. Both cases make use of the escape operator ++ (see Escape operator) which transfers local data to the remote as part of the query.
Creating: using the special assign <<- a new z is created on the remote instance in the global environment. It is of course possible to use the normal left assign operator <- if one wants z to remain in the local interpretation context (see Interpretation contexts).
c1 ? (z <<- ++x)
Appending: we assume in the following example that there already exists on the remote ztsdb a variable of zts type named z. Its number of columns must match the number of columns of the xts x in the R session. Furthermore, the first row of x must have a time value that is strictly larger than the last value of z because time-series must have a strictly sorted index. Also note the use of the pass by reference operator
c1 ? rbind(--z, ++x)
Live appends
Live appends can be done from C or C++ after establishing a standard TCP connection to a remote ztsdb instance. A few examples are provided with the ztsdb source code:
- C simple example
- C++ simple example using only C++ standard library structures
- A more elaborate C++ example using some of ztsdb's structures
Appends can be made for the following types: double, time, duration, interval, bool, and of course for zts.
The C++ signature, defined in zcpp_stdlib.hpp is the following:
Global::buflen_pair make_append_msg(const std::vector<std::string>& name,
const std::vector<Global::dtime>& idx,
const std::vector<double>& v);
The names
parameter defines the zts to which the data will be
appended. If the vector contains a single element, then a variable of
type zts with the specified name will be retrieved from the global
environment and if found appended to. If it contains multiple
elements, the first element is considered the name of a list which is
again searched for in the global environment. The subsequent strings
in the vector are used to search for list elements. This allows for
the updating of zts that are inside lists at arbitrary depths.
Note that idx
and v
can have multiple entries, but obviously they
need to have the same number of rows. v
elements are matched to the
columns of the zts whatever its dimensionality (remember that zts
can be of arbitrary dimensions).
Logging
Logging is controlled by the options logfile.path, logfile.name and log.level. They are self-evident. ztsdb will log events and errors, but not on the data path as this could lead to unacceptable degradation. Errors and information that occur on the data path are handled by statistics and info.
Stats and info
Conceptually, ztsdb is divided into three layers:
the net layer which takes care of the TCP connections and the segmentation and reassembly of buffers,
the message layer which takes care of routing a message to the correct interpretation context,
the context layer which handles the interpreation of queries and the state of both incoming and outgoing requests.
For each of these layers there exists a set of statistics and a set of information data.
Statistic can be obtained for each layer with:
stats.net()
stats.msg()
stats.ctx()
And state information can be obtained in a similar way:
info.net()
info.msg()
info.ctx()