Running ztsdb

Starting ztsdb

ztsdb can be started on the command line and by default will provide a REPL. It does not provide command line editing/history but this can be achieved with the readily available rlwrap utility.

Process options

Here are the options as obtained with the --help option. Note that there is a server mode and an eval mode. The eval mode runs the expression on the command line and then exits; it is useful for example to run some code for verification (this is the mode used to run ztsdb's R unit tests).

Some of the configuration options can be overridden on the command line. The command line options override the configuration file, but there is otherwise no differences.

Usage: ztsdb [-h|--help] [-V|--version] [-lENUM|--log.level=ENUM]
         [-Cdirname|--config.path=dirname] [-iztsdb code|--init.code=ztsdb
         code]
  or : ztsdb -eexpression|--expression=expression
  or : ztsdb [-aSTRING|--address=STRING] [-pINT|--port=INT]

  -h, --help                   Print help and exit
  -V, --version                Print version and exit
  -l, --log.level=ENUM         log level  (possible values="TRACE",
                                 "DEBUG", "INFO", "ERROR")
  -C, --config.path=dirname    config file path
  -i, --init.code=ztsdb code   initial code

 Mode: server
  -a, --address=STRING         address
  -p, --port=INT               listen port

 Mode: eval
  -e, --expression=expression  evaluate expression and exit (mandatory)

Config file

The following options are configurable; here they are shown commented out with their default values.

# address=""
# port=0
# timezone.path="/usr/share/zoneinfo"
# logfile.path="/tmp"
# logfile.name="ztsdb.log"
# log.level=INFO
# init.code=""
# prompt="> "

# timezone="UTC"
# digits=7
# scipen=0
# width=100
# max.print=99999
# expressions=10000

# data.q.size=100000
# sig.q.size=100000
# commbuf.ttl.secs=60
# in.req.ttl.secs=180
# in.rsp.ttl.secs=180

Configuration options can be modified when the process is running using the options function, except for address, port and timezone.path, and this is done like in R:

options()           ## prints the value of all options
options()$digits    ## prints the value of option 'digits'

options(digits=6)   ## change the value of option 'digits' to 6

Main concepts

REPL

ztsdb

When started in server mode (and connected to a terminal) the ztsdb executable has a read–eval–print loop (REPL). This means that a ztsdb instance can be used just as a shell client.

R

An alternative to the ztsdb REPL is to use R's REPL. The difference of course is that there is no local instance, so any code must be to the right of a connection and the ? query operator. Since some types are not transmissible via TCP (builtin, _function, timer, connection) the R REPL will not print these types directly (but a string can be obtained using the str function).

Communication

All communication with a ztsdb instance is via TCP. A connection needs to be created with an IP address and a port. This is true both for the R and the ztsdb REPL, and here again, the code is exactly the same on both.

ztsdb has incoming queues of configurable length. These queues should be dimensioned in order to be able to buffer incoming updates and queries (define the terms somewhere) during the interpretation of a query. On the data path the maximum number of buffers is controlled by data.q.size, whereas on the signalling path the maximum number of buffers is controlled by sig.q.size (signalling messages are TCP connection messages, i.e. up or down).

Interpretation contexts

Each connection object establishes a new interpretation context on the remote side, even if the address and port are the same. When a connection goes down the application context is torn down with is and all variables that were defined in the associated environment are destroyed. See the next section for more about interpretation contexts.

Isolation

A connection, a timer and the local REPL always have a one to one association with an interpretation context (IC). An IC is composed of an environment (ztsdb has a hierarchical concept of environments that is similar to R) and also maintains the state of both outgoing and incoming queries. Variables created with the single assign operator <- are always created in the local environment of the IC and therefore can never be accessed from another IC. On the other hand, variables created by the double assign operator <<- are created in the global environment and are therefore accessible from any IC. This means that any data that is meant to be shared between multiple users must be created in the global environment.

The interpreter is single threaded, so two queries can never execute at the same time. Nonetheless, whenever some code attempts to access the result of a query, a future exception is raised and it is suspended until it receives the response or the request times out. After suspension of an IC, another incoming query to another IC will be interpreted.

Durability

Any array or time-series is persisted to disk if the file argument is specified. It must not be an already existing directory.

Note that on restart, ztsdb does not automatically load any data. This is done either via the init.code option which can also be specified as a command line argument. If the loading of data and the desired state (e.g. timers, in-memory array/time-series) is complex, then the usual init-code will consist in the invocation of the source buit-in function. The source file may contain code of arbitrary complexity to get the database up and running in the desired set. A simple demo of this can be found here

Copying of an object that is persisted to disk is not allowed and a persistent object is always shown as locked when using the str function:

a <- matrix(1:9, 3, 3, file="/tmp/9632346557")
str(a)
## double - ord [1:3, 1:3] 1 2 3 4 5 6 7 8 9
## - mmap file = /tmp/9632346557, locked

Locking

In addition to the automated locking of objects persisted to disk, it is possible to mark as locked in-memory objects too. This allows to make sure a large object is not unwittingly copied. The functions to lock/unlock are unsurprisingly lock and unlock. If the object to lock/unlock passed as parameter is passed by reference, then it is locked/unlocked; if it is passed by value then the function returns a locked/unlocked copy of the object. Finally, an object's lock status can be tested with the function is.locked:

a <- matrix(1:9, 3, 3)
is.locked(a)
## [1] FALSE

lock(--a)
is.locked(a)
## [1] TRUE

unlock(--a)
is.locked(a)
## [1] FALSE

Pass by reference

In a DBMS with persistent structures a copy is at best undesirable and often impossible. For this reason ztsdb adds an explicit pass by reference operator -- for function arguments. Here are a couple of examples:

Arguments may be passed by reference to a user function:

f <- function(x) x[1] <- 0
a <- 1
f(a)
a          ## 'a' is 1
f(--a)
a          ## 'a' is 0

Arguments may also be passed by reference to built-in functions. In the following example a memory mapped array is created; using pass by reference for the rbind operation then guarantees that the data will be added in place (i.e. on disk) to the memory mapped structure:
```
a <- matrix(1:9, 3, 3, file="/tmp/test_pass_by_reference")
rbind(--a, 10:12)
a
## produces the following result:
##       [,1] [,2] [,3]
##  [1,]  1    4    7  
##  [2,]  2    5    8  
##  [3,]  3    6    9  
##  [4,] 10   11   12  
```

Bulk insert

Bulk insert can be done with the read.csv function. See CSV read/write.

Bulk insert can also be done via an R xts time series. In an R session, let x be a variable of xts type and c1 a variable of type connection, then it is possible to transfer the times series x to the ztsdb instance connected to c1 either by creating a new remote time-series or by appending to an existing remote time-series. Both cases make use of the escape operator ++ (see Escape operator) which transfers local data to the remote as part of the query.

Creating: using the special assign <<- a new z is created on the remote instance in the global environment. It is of course possible to use the normal left assign operator <- if one wants z to remain in the local interpretation context (see Interpretation contexts).
```
c1 ? (z <<- ++x)
```
Appending: we assume in the following example that there already exists on the remote ztsdb a variable of zts type named z. Its number of columns must match the number of columns of the xts x in the R session. Furthermore, the first row of x must have a time value that is strictly larger than the last value of z because time-series must have a strictly sorted index. Also note the use of the pass by reference operator
```
c1 ? rbind(--z, ++x)
```

Live appends

Live appends can be done from C or C++ after establishing a standard TCP connection to a remote ztsdb instance. A few examples are provided with the ztsdb source code:

Appends can be made for the following types: double, time, duration, interval, bool, and of course for zts.

The C++ signature, defined in zcpp_stdlib.hpp is the following:

Global::buflen_pair make_append_msg(const std::vector<std::string>& name, 
                                    const std::vector<Global::dtime>& idx, 
                                    const std::vector<double>& v);

The names parameter defines the zts to which the data will be appended. If the vector contains a single element, then a variable of type zts with the specified name will be retrieved from the global environment and if found appended to. If it contains multiple elements, the first element is considered the name of a list which is again searched for in the global environment. The subsequent strings in the vector are used to search for list elements. This allows for the updating of zts that are inside lists at arbitrary depths.

Note that idx and v can have multiple entries, but obviously they need to have the same number of rows. v elements are matched to the columns of the zts whatever its dimensionality (remember that zts can be of arbitrary dimensions).

Logging

Logging is controlled by the options logfile.path, logfile.name and log.level. They are self-evident. ztsdb will log events and errors, but not on the data path as this could lead to unacceptable degradation. Errors and information that occur on the data path are handled by statistics and info.

Stats and info

Conceptually, ztsdb is divided into three layers:

the net layer which takes care of the TCP connections and the segmentation and reassembly of buffers,
the message layer which takes care of routing a message to the correct interpretation context,
the context layer which handles the interpreation of queries and the state of both incoming and outgoing requests.

For each of these layers there exists a set of statistics and a set of information data.

Statistic can be obtained for each layer with:

stats.net()
stats.msg()
stats.ctx()

And state information can be obtained in a similar way:

info.net()
info.msg()
info.ctx()