Reference
Types
- Array types
double(64-bit floating point)logical(boolean)character(string)nanotime(date/time)nanoival(date/time interval)nanoduration(duration of time in nanoseconds)nanoperiod(a duration that can handle the variable durations of day, month and year)
- Composite types
zts(time-series)list
- Special types
connection(connection to another ztsdb instance)builtin(buit-in function)function(user-defined function)error
Arrays
The various constructors functions (vector, matrix and array)
all yield the same array type, respectively 1-dimensional,
2-dimensional and >=3-dimensional arrays. In particular a scalar is a
1-dimensional array of size 1. The following types are arrays (note in
particular the absence of an integer type):
doublelogicalcharacternanotimenanoivalnanodurationnanoperiod
The constructors are similar to R:
1 # vector of size 1
vector(mode="double", 3)
c(1, 2, 3, 4, 5)
### [1] 1 2 3 4 5
c(TRUE, FALSE, FALSE, TRUE)
### [1] TRUE FALSE FALSE TRUE
matrix(1:9, 3, 3, dimnames=list(c("i","ii","iii"), c("one","two","three")))
array(1:8, c(2,2,2), dimnames=list(NULL,NULL,c("one","two")))
One significant difference with R is that arrays (as well as
time-series) can be persistent. This is controlled by the file
argument which indicates the name of the directory that will be
created in order to hold the memory mapped files associated with the
array:
a <- array(1:2e6, c(1e6,2), file="/tmp/memory_mapped_array_directory")
The str function displays the memory-mapped directory if any:
str(a)
### displays:
### double - ord [1:1000000, 1:2] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ...
### - mmap file = /tmp/memory_mapped_array_directory
Another feature of arrays is that ztsdb maintains the knowledge of
the ordering of each column. str will indicate that all columns are
ordered by printing out "ord". The function is.ordered can also be
used to determine this programmaticaly.
is.ordered(1:10) # TRUE
is.ordered(10:1) # FALSE
str(1:10)
### double - ord [1:10] 1 2 3 4 5 6 7 8 9 10
### - malloc-based
str(10:1)
### double [1:10] 10 9 8 7 6 5 4 3 2 1
### - malloc-based
Date/time representation
ztsdb has four built-in date and time related types. They are
nanotime, nanoival, nanoduration and nanoperiod.
nanotime
A nanotime is a specific point in time with nanosecond precision:
timepoint1 <- |.2009-01-01 13:12:00.000000001 America/New_York.|
timepoint2 <- as.nanotime("2009-01-01 13:12:00.000000001 America/New_York")
It is encoded as the nanosecond offset since 1970-01-01 UTC. This
means the nanotime range is approximately from year 1386 to year
2554. It does not have an associated time zone, but can be displayed
in any desired time zone with the print function. It follows the
POSIX convention, in particular it does not have the notion of leap
seconds.
nanoival
An nanoival is represented by two points in time and the start or
the end of the interval can be either closed or open. In the string
constructor, a closed start(end) is indicated with the '+' sign and
indicates that the start(end) is included in the interval. An open
start(end) is indicated with the '-' sign and means it is not included
in the interval. For the constructor version that takes times, two
additional arguments may be specified, sopen and eopen which, when
true indicate that, respectively, the start or end of the interval is
open. By default an interval has a closed start and an open end.
ival <- |+2009-01-01 13:12:00 America/New_York -> 2009-02-01 15:11:03 America/New_York-|
as.nanoival("-2009-01-01 13:12:00 America/New_York -> 2009-02-01 15:11:03 America/New_York+")
start <- |.2009-01-01 13:12:00 America/New_York.|
one_hour <- as.nanoduration("01:00:00")
end <- start + one_hour
### all the following produce the same interval:
nanoival(start, end) # by default sopen=T,eopen=F
nanoival(start, end, sopen=T, eopen=F)
nanoival(start, duration=one_hour)
It is encoded as two nanotime values with additional flags indicating if
the beginning and end of the intervals are opened or closed. Accessors
are defined in order to access its components:
ival <- |+2009-01-01 13:12:00 America/New_York -> 2009-02-01 15:11:03 America/New_York-|
nanoival.start(ival) # |.2009-01-01 13:12:00 America/New_York.|
nanoival.end(ival) # |.2009-02-01 15:11:03 America/New_York.|
nanoival.sopen(ival) # FALSE
nanoival.eopen(ival) # TRUE
nanoduration
A nanoduration is a count of nanoseconds, which may be negative:
one_second <- as.nanoduration(1e9)
one_second <- as.nanoduration("00:00:01")
one_hour <- as.nanoduration("01:00:00")
one_nanosecond <- as.nanoduration("00:00:00.000_000_001")
nanoperiod
nanoperiod represents the calendar or "business" view of a duration with
the concepts of month and day. The exact duration of a period is
unknown until it is anchored to a point in time and associated with a
time-zone.
nanoperiod is composed of two parts a month/days part and a
duration. These two components may have opposite signs.
Note that for convenience reasons the constructor syntax allows specifying years and weeks, but they are converted to their representation in months/days.
### constructor from string:
one_month_one_day <- as.nanoperiod("1m1d")
one_day_minus_12_hours <- as.nanoperiod("1d/-12:00:00")
one_of_everything <- as.nanoperiod("1y1m1w1d/01:01:01.000_000_001")
### constructor from double and nanoduration arguments:
one_month_one_day <- nanoperiod(months=1, days=1)
one_Day_minus_12_hours <- nanoperiod(days=1, duration=as.nanoduration(01:01:01.000_000_001")
Accessor for nanoperiod's components are provided:
ones <- as.nanoperiod("1y1m1w1d/01:01:01.000_000_001")
nanoperiod.month(ones)
nanoperiod.day(ones)
nanoperiod.nanoduration(ones)
Operations
Arithmetic operations
Arithmetic operations are straighforward. Like in R one can use the infix notation of the functional notation be enclosing the operator in backquotes.
1 + 1 == `+`(1, 1) # TRUE
Arithmetic operations on temporal types
A nanoduration can be added, subtracted, multiplied or divided, and the result is a nanoduration.
one_second <- as.nanoduration("00:00:01")
one_second + one_second
### [1] 00:00:02
three_seconds = 3 * one_second
three_seconds / 3
### [1] 00:00:01
A nanoduration can be added or subtracted to a nanotime or to an
nanoival, and the result remains respectively a nanotime or an
nanoival.
one_second <- as.nanoduration("00:00:01")
timepoint <- |.2009-01-01 13:12:00.000000001 America/New_York.|
timepoint + one_second
### [1] 2009-01-01 13:12:01.000000001 EST
A nanoperiod can be added or subtracted to a nanotime or to an
nanoival, and the result remains respectively a nanotime or an
nanoival.
The functional way of specifying the operators (`+` and `-`) is
needed for the addition/subtraction of a nanoperiod because an
additional time-zone argument must be specified. For example:
one_month <- as.nanoperiod("1m")
`+`(|.2009-01-01 13:12:00 America/New_York.|, one_month, "America/New_York")
### [1] 2009-02-01 13:12:00 EST
Subsetting and subassignment
Subsetting and subassignment work as in R. An index can be a logical
vector or a double vector or a character vector. Additionally a
nanotime vector can be subsetted by either a nanotime or an nanoival
vector, and an nanoival vector can be subsetted by an nanoival
vector (see also Date/time intersection).
a[1:10, 1]
a[1:10, c(TRUE,FALSE)]
a[,]
a[1:10, ] <- a[1:20]
Set operations
The set operations intersect, union and setdiff are defined for
both nanotime and nanoival vectors. If these vectors are not ordered,
they will be sorted before the operation is carried out. This means
that for unordered vectors, there is a performance penalty for these
operations that must be taken into consideration.
Additionally, each of these set operations has a counterpart function
intersect.idx, union.idx and setdiff.idx that, instead of
computing a new set, returns the index of the set.
Intersection
nanotime[nanoival]
nanoival[nanotime]
nanoival[nanoival]
ts[nanoival]
ts[nanotime]
or alternatively:
intersect(nanotime, nanoival)
intersect(nanoival, nanoival)
Union
union(nanoival, nanoival) # gives back the minimal interval set
union(nanotime, nanotime)
Difference
setdiff(nanotime, nanotime)
setdiff(nanoival, nanoival)
setdiff(nanotime, nanoival)
Calendar operations
Rounding is defined for nanotime and nanoival, with the following set
of constants: "second", "minute", "hour", "day", "week", "month",
"quarter", "year". For all constants that require the computation to
take into account daylight saving time, the time zone argument tz
is required. For example:
round(nanotime, "day", tz="Europe/London")
round(nanotime, "minute")
round(nanoival, "month", tz="America/New_York")
round(nanoival, "second")
Conversion of calendar periods to an integer value is defined for nanotime objects:
dayweek(Sys.time(), "America/New_York") # 0 to 6 (0 is Sunday)
daymonth(Sys.time(), "America/New_York") # 1 to 31
dayyear(Sys.time(), "America/New_York") # 1 to 366
month(Sys.time(), "America/New_York") # 1 to 12
year(Sys.time(), "America/New_York")
Distance in calendar periods between two dates. Gives back a double
which indicates the number units of the chosen calendar period:
dist(|.2009-01-01 13:12:00 America/New_York.|, |.2009-01-01 13:12:00 America/New_York.|, "day", tz)
Generating sequences
ztsdb proposes a function similar to R seq function. Temporal
sequences (either nanotime or nanoival) can be created with a by
argument that can be either a nanoduration or a nanoperiod. In the case
where it is a period, the tz argument must be specified in order to
associate a time-zone to the operations.
one_day <- as.nanoperiod("1d")
seq(from=|.2009-01-01 13:12:00 America/New_York.|,
to= |.2016-01-01 13:12:00 America/New_York.|,
by=one_day, tz="America/New_York")
one_second <- as.nanoduration("00:00:01")
seq(from=|.2009-01-01 13:12:00 America/New_York.|,
to= |.2009-01-02 13:12:00 America/New_York.|,
by=one_second)
seq(from=|+2009-01-01 13:00:00 America/New_York -> 2009-01-01 15:00:00 America/New_York-|,
to= |+2010-01-01 13:00:00 America/New_York -> 2010-01-01 15:00:00 America/New_York-|,
by=one_day, tz="America/New_York")
CSV read/write
Like in R the function read.csv and write.csv are provided. ztsdb
does not adhere strictly to RFC 4180. In particular we use (and
expect) CR and not CRLF. And although we allow quoted elements, we
don't allow the separator to appear in a string. We believe these
functions to be mostly useful for time-series where this limitation
has little impact.
Rolling functions
These functions "roll" over each column calculating a value over a
given window of observations. They can be used either on double or
on zts. Their signature is:
function(x, window, nvalid=window)
x is the double or zts, window an integer that defines the
number of observation on which the operation will be performed, and
nbvalid is the number of non-NaN observations needed to consider a
result valid. For example, a window of 10 and a nbvalid of 5 means
that if non-NaN 5 or more observations exist in the window, then the
result will be computed; otherwise it will be set to NaN. The
functions are rollmean, rollmin, rollmax, rollvar,
rollcov. See an example of the usage of rollcov
here.
Array and zts transformation
These functions transform a double or a zts column-wise.
locf
Last observation carried forward. A non-NaN observation is carried forwards to fill-in a NaN observation is the non-NaN and NaN observations are in the same window specified by n; the signature of this function is:
function(x, n)
move
Moves all observations down or up depending on the value n. Positive n move down while negative n move up. A NaN value is assigned for observations which are moved without being filled (at the beginning or at the end of the columns depending on the direction of the move). The signature is:
function(x, n)
rotate
Works as the move function, but the observations wrap around and so no NaN are produced. The signature is:
function(x, n)
rev
Reverses, still column-wise, an array or a list. The signature is:
function(x)
Cummulative functions
These function cumulate values. The rev parameter controls in which direction. The functions of this group are: cumsum, cumprod, cumdiv, cummax, cummin. Their signature is:
function(x, rev=FALSE)
Aggregate functions
sum and prod which provide respectively the sum and the product of
all elements of a vector/matrix/array are provided and work like in R.
Time-series
zts, the time-series type, is composed of a nanotime vector and a
double array. The length of the nanotime index is the same as the
first dimension of the array of double. This means that each nanotime
element is associated to a "horizontal" slice of the array of
double. This first dimension has the same special time subsetting
capabilities as the nanotime type.
Creation
A time-series is created with a nanotime vector and a corresponding
double (i.e. the length of the first dimension of the array is the
same as the length of the nanotime vector). Note that like arrays, time
series can have an arbitrary number of dimensions. And just like
arrays, a time-series can be memory-mapped by supplying the optional
argument file which indicates where the memory-mapped files will be
written.
idx <- c(|.2015-03-09 06:38:01 America/New_York.|,
|.2015-03-09 06:38:02 America/New_York.|,
|.2015-03-09 06:38:03 America/New_York.|)
z <- zts(idx, matrix(1:6, 3, 2, dimnames=list(NULL, c("one", "two"))), file="memory_mapped_dir")
### one two
### 2015-03-09 06:38:01.000000000 EDT 1 4
### 2015-03-09 06:38:02.000000000 EDT 2 5
### 2015-03-09 06:38:03.000000000 EDT 3 6
Note the the dim argument can be omitted in the case of a
two-dimensional time-series as the size can be calculated with the
lenght of the vector.
Accessors
zts is an aggregate type and its components can be accessed with the
following functions:
zts.idx(z)
zts.data(z)
Operations on time-series
Subset and subassign operations are defined similarly to double, and
the first dimension follows the indexation semantics of a nanotime
vector.
ivl <- |+2015-03-09 06:38:01 America/New_York -> 2015-03-09 06:38:02 America/New_York+|
z[ivl,]
### one two
### 2015-03-09 06:38:01.000000000 EDT 1 4
### 2015-03-09 06:38:02.000000000 EDT 2 5
Arithmetic operations are defined as for double:
z + z
### one two
### 2015-03-09 06:38:01.000000000 EDT 2 8
### 2015-03-09 06:38:02.000000000 EDT 4 10
### 2015-03-09 06:38:03.000000000 EDT 6 12
The bind family of functions is also defined, but note that a zts
index must remain strictly sorted (and consequently with unique
values).
Align operations
The function align has the following signature:
align(from, to, start=as.nanoduration(0), end=as.nanoduration(0), method="closest", tz=NULL)
It aligns the observations of the zts from onto the vector of
nanotime to, effectively yielding a new time-series that has the
vector to as time index.
The arguments start and end define an interval which will be used
to pick a value out of from. The alignment algorithm is the
following: for each time t in to, define the interval i
[t - start; t + end[ (note that start is closed whereas end is open,
i.e. end is not part of the interval). For each i so defined, pick a
value out of from that is computed over the values of from that fall
in that interval.
start and end can either be a nanoduration or a nanoperiod. If one of
the two is a nanoperiod then tz needs to be defined in order to give
meaning to the interval.
The argument method controls which value will be picked out of
from for a given value of to and can have the values:
- closest: pick the observation that is closest to the time
tinto; note that this method will considerito be the closed interval[t - start; t + end] - count: count the number of observation in
fromthat fall ini - min: pick the obervation with the smallest value
- max: pick the obervation with the largest value
- mean: compute the observations
- median: compute the median of the observations
Here is a visualization of align(t1, t2, -one_hour, "closest"):
Here is a visualization of align(t1, t2, -one_hour, "count"):
### create a zts for the example:
one_second <- as.nanoduration("00:00:01")
idx <- seq(|.2015-01-01 12:00:00 America/New_York.|,
|.2015-02-01 12:00:00 America/New_York.|,
by=one_second)
z <- zts(idx, data=0:(length(idx)-1))
### create a vector of nanotime onto which z will be aligned:
to <- c(seq(|.2015-01-01 12:00:00 America/New_York.|,
|.2015-02-01 00:00:00 America/New_York.|,
by=one_hour),
align(z, to, -one_hour, method="count") # the values of this zts will be 3600
align(z, to, -one_hour, method="closest") # the values of this zts will be 0, 3600, 7200, ...
Additionally, the function align.idx is provided and has the signature:
align.idx(from, to, start=as.nanoduration(0), end=as.nanoduration(0), tz=NULL)
This function makes a "closest" align and instead of returning a
time-series, it returns the index of the values in from.
op.zts operation
op.zts performs arithmetic operations between two time-series and
has the following signature, where `x and y are time-series and
op is a string.
op.zts(x, y, op)
Each entry in the left time-series operand defines an interval from
the previous entry, and the value associated with this interval will
be applied to all the observations in the right time-series operand
that fall in the interval. Note that the interval is closed at the
beginning and open and the end. The available values for op are
"*", "/", "+", "-".
Here is a visualization of op.zts(t1, t2, "*"):
Connecting and querying
Connection
A connection is a handle to a remote ztsdb instance. The underlying
protocol of a connection is TCP. It is created like this:
c1 <- connection(host="127.0.0.1", port=19001)
A connection is created only if the connection was successfully
established with the remote instance.
With a connection it is possible to run any code remotely using the
? (query) operator:
c1 ? 1 # evaluate 1 remotely
c1 ? 1 + 1 # evaluate 1+1 remotely
c1 ? a <<- array(1:27, c(3,3,3)) # create 'a' remotely in the global environment
c1 ? a # get 'a'
c1 ? a[1, 2, 1]
c1 ? a[,1,1]
c1 ? { b <- 2; a * b } # create 'b' in the remote context environment
# and send back 'a * b'
Escape operator
It is also possible to escape code with the ++ operator, so that it
is evaluated locally before being sent remotely as part of the query:
la <- 2
c1 ? ++la * a # take 'la' locally, send it over to the remote instance,
# multiply it by the remote 'a' and send result back
c1 ? ++{ lb <- 2; lc <- 3; lb * lc } * a # 6 * 'a' where 6 is evaluated locally
More complicated schemes are possible, such as defining remote handles, remote escapes, etc.
Synchronous and asynchronous queries
A query is immediately dispatched to the remote instance for interpretation. Locally, a future is created as a placeholder for the result of the query. The execution of the code then continues until the value of the future is needed when it is used in an expression (or needs to be returned as the result of a query). This means that it is possible to control if a query is synchronous or asynchronous respectively by using or not using the result of the query.
### synchronous:
a <- (c1 ? x) + (c2 ? y) # sync'd by the '+'; the queries go out in parallel to 'c1' and 'c2'
### asynchronous:
{ c1 ? x; c2 ? y; NULL } # the result of 'c1' and 'c2' are never used
Timers
It is possible to repeat the execution of code at interval. A timer creates a new interpretation context, but when a timer is destroyed then the interpretation context is torn down too. To avoid this a timer can of course be declared in global scope.
A timer has the following signature:
function(nanoduration, loop, once=NULL, loop_max=0)
loop_max indicates the number of repetitions. A value of 0
indicates infinite repetitions. The once argument takes an
expression that is evaluated only once. It is useful for example for
creating local variables and setting up the job that will be done by
the loop code. The latter is an expression that will be run at each
timer expiry.
Timers are useful for a large variety of tasks: data distribution and backup, data transformation, etc. The following example calculates mean-minutes and stores them into a time-series available for querying.
Built-in functions
This source file has a list of the built-in functions together with their signatures and the allowable parameter types. For most of these functions, the functionality and parameters are the same as in R.
Environments and assignments
Environments
ztsdb has a notion of environment, but they are not a first class type like in R. Another difference is that ztsdb has dynamic scoping see Scoping.
The environment hierachy is the following: "base" <- "global" <- ... <- "current"
"base" is the parent environment and contains all the built-in functions
"global" is the parent of all interpretation context environments, and the place where global variables (e.g. time-series that should be shared between different interpretation contexts should be defined
the ellipsis stands for a set of 0..n environments
the environment after "global" is always the interpretation context environment
Managing environment content
The functions that help manage an environment's content R. They are
namely assign, get, ls and rm (and its synonym remove) and
they work roughly like in R. For convenience the function lsg is
provided and is the same as ls except that name is by default
initialized to "global" and so it lists by default the content of the
global environment.
The signatures are:
assign(x, value, envir="current", inherits=FALSE)
get(x, envir="current", inherits=FALSE)
ls(name="current")
lsg(name="global")
rm(..., list=character(), envir="current", inherits=FALSE)
Assignments
Simple assign
The simple assign operator (<-) always assigns to a variable in the
current environment. This means that a variable created by the simple
assign will never be visible to another interpretation context. It a
variable is created in a function then the variable will be local to
this function.
Special assign
The special assign operator (<<-) works like in R. It looks in the
current environment for the variable, and if not found it examines up
to the parent environment and so on. If it doesn't find a pre-existing
variable and it gets to the "global" environment, then it creates a
new variable there and makes the assignment.
Caution must be exercised when using the special assign to declare
global variables, because it might result in an assignment unwittingly
occuring in a child of the "global" environment. A safer way to
achieve global assignment is to use the assign function and
specifying the parameter envir as "global":
a <<- 123 # dangerous if 'a' is already defined in a child environment
assign("a", 123, envir="global") # safe
Errors
Compared to R, ztsdb has a simplified mechanism for handling errors
and there is no concept of warning. Errors can only be captured via
.Last.error.
ztsdb implements, like in R, a try/catch mechanism which can be used like this:
## the following returns "not valid"
error_string <- "invalid type for binary operator (double + string)"
a <- tryCatch(1 + "a", if (.Last.error==error_string) -1)
a # 'a' has value -1
Durability
Permanent objects are arrays and zts that are declared with the
file parameter (see Arrays). The
objects are then memory-mapped to a set of files in the directory
indicated by file. To allow a deterministic file state, the function
msync is provided. It has the signature:
function(x, async=FALSE)
The async parameter determines if the operation is asynchronous or
not.