Reference
Types
- Array types
double
(64-bit floating point)logical
(boolean)character
(string)nanotime
(date/time)nanoival
(date/time interval)nanoduration
(duration of time in nanoseconds)nanoperiod
(a duration that can handle the variable durations of day, month and year)
- Composite types
zts
(time-series)list
- Special types
connection
(connection to another ztsdb instance)builtin
(buit-in function)function
(user-defined function)error
Arrays
The various constructors functions (vector
, matrix
and array
)
all yield the same array type, respectively 1-dimensional,
2-dimensional and >=3-dimensional arrays. In particular a scalar is a
1-dimensional array of size 1. The following types are arrays (note in
particular the absence of an integer type):
double
logical
character
nanotime
nanoival
nanoduration
nanoperiod
The constructors are similar to R:
1 # vector of size 1
vector(mode="double", 3)
c(1, 2, 3, 4, 5)
### [1] 1 2 3 4 5
c(TRUE, FALSE, FALSE, TRUE)
### [1] TRUE FALSE FALSE TRUE
matrix(1:9, 3, 3, dimnames=list(c("i","ii","iii"), c("one","two","three")))
array(1:8, c(2,2,2), dimnames=list(NULL,NULL,c("one","two")))
One significant difference with R is that arrays (as well as
time-series) can be persistent. This is controlled by the file
argument which indicates the name of the directory that will be
created in order to hold the memory mapped files associated with the
array:
a <- array(1:2e6, c(1e6,2), file="/tmp/memory_mapped_array_directory")
The str
function displays the memory-mapped directory if any:
str(a)
### displays:
### double - ord [1:1000000, 1:2] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ...
### - mmap file = /tmp/memory_mapped_array_directory
Another feature of arrays is that ztsdb maintains the knowledge of
the ordering of each column. str
will indicate that all columns are
ordered by printing out "ord". The function is.ordered
can also be
used to determine this programmaticaly.
is.ordered(1:10) # TRUE
is.ordered(10:1) # FALSE
str(1:10)
### double - ord [1:10] 1 2 3 4 5 6 7 8 9 10
### - malloc-based
str(10:1)
### double [1:10] 10 9 8 7 6 5 4 3 2 1
### - malloc-based
Date/time representation
ztsdb has four built-in date and time related types. They are
nanotime
, nanoival
, nanoduration
and nanoperiod
.
nanotime
A nanotime
is a specific point in time with nanosecond precision:
timepoint1 <- |.2009-01-01 13:12:00.000000001 America/New_York.|
timepoint2 <- as.nanotime("2009-01-01 13:12:00.000000001 America/New_York")
It is encoded as the nanosecond offset since 1970-01-01 UTC. This
means the nanotime
range is approximately from year 1386 to year
2554. It does not have an associated time zone, but can be displayed
in any desired time zone with the print
function. It follows the
POSIX convention, in particular it does not have the notion of leap
seconds.
nanoival
An nanoival
is represented by two points in time and the start or
the end of the interval can be either closed or open. In the string
constructor, a closed start(end) is indicated with the '+' sign and
indicates that the start(end) is included in the interval. An open
start(end) is indicated with the '-' sign and means it is not included
in the interval. For the constructor version that takes times, two
additional arguments may be specified, sopen
and eopen
which, when
true indicate that, respectively, the start or end of the interval is
open. By default an interval has a closed start and an open end.
ival <- |+2009-01-01 13:12:00 America/New_York -> 2009-02-01 15:11:03 America/New_York-|
as.nanoival("-2009-01-01 13:12:00 America/New_York -> 2009-02-01 15:11:03 America/New_York+")
start <- |.2009-01-01 13:12:00 America/New_York.|
one_hour <- as.nanoduration("01:00:00")
end <- start + one_hour
### all the following produce the same interval:
nanoival(start, end) # by default sopen=T,eopen=F
nanoival(start, end, sopen=T, eopen=F)
nanoival(start, duration=one_hour)
It is encoded as two nanotime
values with additional flags indicating if
the beginning and end of the intervals are opened or closed. Accessors
are defined in order to access its components:
ival <- |+2009-01-01 13:12:00 America/New_York -> 2009-02-01 15:11:03 America/New_York-|
nanoival.start(ival) # |.2009-01-01 13:12:00 America/New_York.|
nanoival.end(ival) # |.2009-02-01 15:11:03 America/New_York.|
nanoival.sopen(ival) # FALSE
nanoival.eopen(ival) # TRUE
nanoduration
A nanoduration
is a count of nanoseconds, which may be negative:
one_second <- as.nanoduration(1e9)
one_second <- as.nanoduration("00:00:01")
one_hour <- as.nanoduration("01:00:00")
one_nanosecond <- as.nanoduration("00:00:00.000_000_001")
nanoperiod
nanoperiod
represents the calendar or "business" view of a duration with
the concepts of month and day. The exact duration of a period is
unknown until it is anchored to a point in time and associated with a
time-zone.
nanoperiod
is composed of two parts a month/days part and a
duration. These two components may have opposite signs.
Note that for convenience reasons the constructor syntax allows specifying years and weeks, but they are converted to their representation in months/days.
### constructor from string:
one_month_one_day <- as.nanoperiod("1m1d")
one_day_minus_12_hours <- as.nanoperiod("1d/-12:00:00")
one_of_everything <- as.nanoperiod("1y1m1w1d/01:01:01.000_000_001")
### constructor from double and nanoduration arguments:
one_month_one_day <- nanoperiod(months=1, days=1)
one_Day_minus_12_hours <- nanoperiod(days=1, duration=as.nanoduration(01:01:01.000_000_001")
Accessor for nanoperiod
's components are provided:
ones <- as.nanoperiod("1y1m1w1d/01:01:01.000_000_001")
nanoperiod.month(ones)
nanoperiod.day(ones)
nanoperiod.nanoduration(ones)
Operations
Arithmetic operations
Arithmetic operations are straighforward. Like in R one can use the infix notation of the functional notation be enclosing the operator in backquotes.
1 + 1 == `+`(1, 1) # TRUE
Arithmetic operations on temporal types
A nanoduration
can be added, subtracted, multiplied or divided, and the result is a nanoduration
.
one_second <- as.nanoduration("00:00:01")
one_second + one_second
### [1] 00:00:02
three_seconds = 3 * one_second
three_seconds / 3
### [1] 00:00:01
A nanoduration
can be added or subtracted to a nanotime
or to an
nanoival
, and the result remains respectively a nanotime
or an
nanoival
.
one_second <- as.nanoduration("00:00:01")
timepoint <- |.2009-01-01 13:12:00.000000001 America/New_York.|
timepoint + one_second
### [1] 2009-01-01 13:12:01.000000001 EST
A nanoperiod
can be added or subtracted to a nanotime
or to an
nanoival
, and the result remains respectively a nanotime
or an
nanoival
.
The functional way of specifying the operators (`+` and `-`) is
needed for the addition/subtraction of a nanoperiod
because an
additional time-zone argument must be specified. For example:
one_month <- as.nanoperiod("1m")
`+`(|.2009-01-01 13:12:00 America/New_York.|, one_month, "America/New_York")
### [1] 2009-02-01 13:12:00 EST
Subsetting and subassignment
Subsetting and subassignment work as in R. An index can be a logical
vector or a double
vector or a character
vector. Additionally a
nanotime
vector can be subsetted by either a nanotime
or an nanoival
vector, and an nanoival
vector can be subsetted by an nanoival
vector (see also Date/time intersection).
a[1:10, 1]
a[1:10, c(TRUE,FALSE)]
a[,]
a[1:10, ] <- a[1:20]
Set operations
The set operations intersect
, union
and setdiff
are defined for
both nanotime
and nanoival
vectors. If these vectors are not ordered,
they will be sorted before the operation is carried out. This means
that for unordered vectors, there is a performance penalty for these
operations that must be taken into consideration.
Additionally, each of these set operations has a counterpart function
intersect.idx
, union.idx
and setdiff.idx
that, instead of
computing a new set, returns the index of the set.
Intersection
nanotime[nanoival]
nanoival[nanotime]
nanoival[nanoival]
ts[nanoival]
ts[nanotime]
or alternatively:
intersect(nanotime, nanoival)
intersect(nanoival, nanoival)
Union
union(nanoival, nanoival) # gives back the minimal interval set
union(nanotime, nanotime)
Difference
setdiff(nanotime, nanotime)
setdiff(nanoival, nanoival)
setdiff(nanotime, nanoival)
Calendar operations
Rounding is defined for nanotime
and nanoival
, with the following set
of constants: "second", "minute", "hour", "day", "week", "month",
"quarter", "year". For all constants that require the computation to
take into account daylight saving time, the time zone argument tz
is required. For example:
round(nanotime, "day", tz="Europe/London")
round(nanotime, "minute")
round(nanoival, "month", tz="America/New_York")
round(nanoival, "second")
Conversion of calendar periods to an integer value is defined for nanotime
objects:
dayweek(Sys.time(), "America/New_York") # 0 to 6 (0 is Sunday)
daymonth(Sys.time(), "America/New_York") # 1 to 31
dayyear(Sys.time(), "America/New_York") # 1 to 366
month(Sys.time(), "America/New_York") # 1 to 12
year(Sys.time(), "America/New_York")
Distance in calendar periods between two dates. Gives back a double
which indicates the number units of the chosen calendar period:
dist(|.2009-01-01 13:12:00 America/New_York.|, |.2009-01-01 13:12:00 America/New_York.|, "day", tz)
Generating sequences
ztsdb proposes a function similar to R seq
function. Temporal
sequences (either nanotime
or nanoival
) can be created with a by
argument that can be either a nanoduration
or a nanoperiod
. In the case
where it is a period, the tz
argument must be specified in order to
associate a time-zone to the operations.
one_day <- as.nanoperiod("1d")
seq(from=|.2009-01-01 13:12:00 America/New_York.|,
to= |.2016-01-01 13:12:00 America/New_York.|,
by=one_day, tz="America/New_York")
one_second <- as.nanoduration("00:00:01")
seq(from=|.2009-01-01 13:12:00 America/New_York.|,
to= |.2009-01-02 13:12:00 America/New_York.|,
by=one_second)
seq(from=|+2009-01-01 13:00:00 America/New_York -> 2009-01-01 15:00:00 America/New_York-|,
to= |+2010-01-01 13:00:00 America/New_York -> 2010-01-01 15:00:00 America/New_York-|,
by=one_day, tz="America/New_York")
CSV read/write
Like in R the function read.csv
and write.csv
are provided. ztsdb
does not adhere strictly to RFC 4180. In particular we use (and
expect) CR
and not CRLF
. And although we allow quoted elements, we
don't allow the separator to appear in a string. We believe these
functions to be mostly useful for time-series where this limitation
has little impact.
Rolling functions
These functions "roll" over each column calculating a value over a
given window of observations. They can be used either on double
or
on zts
. Their signature is:
function(x, window, nvalid=window)
x
is the double
or zts
, window
an integer that defines the
number of observation on which the operation will be performed, and
nbvalid
is the number of non-NaN observations needed to consider a
result valid. For example, a window
of 10 and a nbvalid
of 5 means
that if non-NaN 5 or more observations exist in the window, then the
result will be computed; otherwise it will be set to NaN. The
functions are rollmean
, rollmin
, rollmax
, rollvar
,
rollcov
. See an example of the usage of rollcov
here.
Array and zts transformation
These functions transform a double
or a zts
column-wise.
locf
Last observation carried forward. A non-NaN observation is carried forwards to fill-in a NaN observation is the non-NaN and NaN observations are in the same window specified by n
; the signature of this function is:
function(x, n)
move
Moves all observations down or up depending on the value n
. Positive n
move down while negative n
move up. A NaN value is assigned for observations which are moved without being filled (at the beginning or at the end of the columns depending on the direction of the move). The signature is:
function(x, n)
rotate
Works as the move
function, but the observations wrap around and so no NaN are produced. The signature is:
function(x, n)
rev
Reverses, still column-wise, an array
or a list
. The signature is:
function(x)
Cummulative functions
These function cumulate values. The rev
parameter controls in which direction. The functions of this group are: cumsum
, cumprod
, cumdiv
, cummax
, cummin
. Their signature is:
function(x, rev=FALSE)
Aggregate functions
sum
and prod
which provide respectively the sum and the product of
all elements of a vector/matrix/array are provided and work like in R.
Time-series
zts
, the time-series type, is composed of a nanotime
vector and a
double
array. The length of the nanotime
index is the same as the
first dimension of the array of double. This means that each nanotime
element is associated to a "horizontal" slice of the array of
double. This first dimension has the same special time subsetting
capabilities as the nanotime
type.
Creation
A time-series is created with a nanotime
vector and a corresponding
double
(i.e. the length of the first dimension of the array is the
same as the length of the nanotime
vector). Note that like arrays, time
series can have an arbitrary number of dimensions. And just like
arrays, a time-series can be memory-mapped by supplying the optional
argument file
which indicates where the memory-mapped files will be
written.
idx <- c(|.2015-03-09 06:38:01 America/New_York.|,
|.2015-03-09 06:38:02 America/New_York.|,
|.2015-03-09 06:38:03 America/New_York.|)
z <- zts(idx, matrix(1:6, 3, 2, dimnames=list(NULL, c("one", "two"))), file="memory_mapped_dir")
### one two
### 2015-03-09 06:38:01.000000000 EDT 1 4
### 2015-03-09 06:38:02.000000000 EDT 2 5
### 2015-03-09 06:38:03.000000000 EDT 3 6
Note the the dim
argument can be omitted in the case of a
two-dimensional time-series as the size can be calculated with the
lenght of the vector
.
Accessors
zts
is an aggregate type and its components can be accessed with the
following functions:
zts.idx(z)
zts.data(z)
Operations on time-series
Subset and subassign operations are defined similarly to double
, and
the first dimension follows the indexation semantics of a nanotime
vector.
ivl <- |+2015-03-09 06:38:01 America/New_York -> 2015-03-09 06:38:02 America/New_York+|
z[ivl,]
### one two
### 2015-03-09 06:38:01.000000000 EDT 1 4
### 2015-03-09 06:38:02.000000000 EDT 2 5
Arithmetic operations are defined as for double
:
z + z
### one two
### 2015-03-09 06:38:01.000000000 EDT 2 8
### 2015-03-09 06:38:02.000000000 EDT 4 10
### 2015-03-09 06:38:03.000000000 EDT 6 12
The bind family of functions is also defined, but note that a zts
index must remain strictly sorted (and consequently with unique
values).
Align operations
The function align
has the following signature:
align(from, to, start=as.nanoduration(0), end=as.nanoduration(0), method="closest", tz=NULL)
It aligns the observations of the zts
from
onto the vector of
nanotime
to
, effectively yielding a new time-series that has the
vector to
as time index.
The arguments start
and end
define an interval which will be used
to pick a value out of from
. The alignment algorithm is the
following: for each time t
in to
, define the interval i
[t - start; t + end[
(note that start
is closed whereas end
is open,
i.e. end
is not part of the interval). For each i
so defined, pick a
value out of from
that is computed over the values of from
that fall
in that interval.
start
and end
can either be a nanoduration
or a nanoperiod
. If one of
the two is a nanoperiod
then tz
needs to be defined in order to give
meaning to the interval.
The argument method
controls which value will be picked out of
from
for a given value of to
and can have the values:
- closest: pick the observation that is closest to the time
t
into
; note that this method will consideri
to be the closed interval[t - start; t + end]
- count: count the number of observation in
from
that fall ini
- min: pick the obervation with the smallest value
- max: pick the obervation with the largest value
- mean: compute the observations
- median: compute the median of the observations
Here is a visualization of align(t1, t2, -one_hour, "closest")
:
Here is a visualization of align(t1, t2, -one_hour, "count")
:
### create a zts for the example:
one_second <- as.nanoduration("00:00:01")
idx <- seq(|.2015-01-01 12:00:00 America/New_York.|,
|.2015-02-01 12:00:00 America/New_York.|,
by=one_second)
z <- zts(idx, data=0:(length(idx)-1))
### create a vector of nanotime onto which z will be aligned:
to <- c(seq(|.2015-01-01 12:00:00 America/New_York.|,
|.2015-02-01 00:00:00 America/New_York.|,
by=one_hour),
align(z, to, -one_hour, method="count") # the values of this zts will be 3600
align(z, to, -one_hour, method="closest") # the values of this zts will be 0, 3600, 7200, ...
Additionally, the function align.idx
is provided and has the signature:
align.idx(from, to, start=as.nanoduration(0), end=as.nanoduration(0), tz=NULL)
This function makes a "closest" align and instead of returning a
time-series, it returns the index of the values in from
.
op.zts operation
op.zts
performs arithmetic operations between two time-series and
has the following signature, where `x
and y
are time-series and
op
is a string.
op.zts(x, y, op)
Each entry in the left time-series operand defines an interval from
the previous entry, and the value associated with this interval will
be applied to all the observations in the right time-series operand
that fall in the interval. Note that the interval is closed at the
beginning and open and the end. The available values for op
are
"*"
, "/"
, "+"
, "-"
.
Here is a visualization of op.zts(t1, t2, "*")
:
Connecting and querying
Connection
A connection
is a handle to a remote ztsdb instance. The underlying
protocol of a connection is TCP. It is created like this:
c1 <- connection(host="127.0.0.1", port=19001)
A connection
is created only if the connection was successfully
established with the remote instance.
With a connection
it is possible to run any code remotely using the
?
(query) operator:
c1 ? 1 # evaluate 1 remotely
c1 ? 1 + 1 # evaluate 1+1 remotely
c1 ? a <<- array(1:27, c(3,3,3)) # create 'a' remotely in the global environment
c1 ? a # get 'a'
c1 ? a[1, 2, 1]
c1 ? a[,1,1]
c1 ? { b <- 2; a * b } # create 'b' in the remote context environment
# and send back 'a * b'
Escape operator
It is also possible to escape code with the ++
operator, so that it
is evaluated locally before being sent remotely as part of the query:
la <- 2
c1 ? ++la * a # take 'la' locally, send it over to the remote instance,
# multiply it by the remote 'a' and send result back
c1 ? ++{ lb <- 2; lc <- 3; lb * lc } * a # 6 * 'a' where 6 is evaluated locally
More complicated schemes are possible, such as defining remote handles, remote escapes, etc.
Synchronous and asynchronous queries
A query is immediately dispatched to the remote instance for interpretation. Locally, a future is created as a placeholder for the result of the query. The execution of the code then continues until the value of the future is needed when it is used in an expression (or needs to be returned as the result of a query). This means that it is possible to control if a query is synchronous or asynchronous respectively by using or not using the result of the query.
### synchronous:
a <- (c1 ? x) + (c2 ? y) # sync'd by the '+'; the queries go out in parallel to 'c1' and 'c2'
### asynchronous:
{ c1 ? x; c2 ? y; NULL } # the result of 'c1' and 'c2' are never used
Timers
It is possible to repeat the execution of code at interval. A timer creates a new interpretation context, but when a timer is destroyed then the interpretation context is torn down too. To avoid this a timer can of course be declared in global scope.
A timer has the following signature:
function(nanoduration, loop, once=NULL, loop_max=0)
loop_max
indicates the number of repetitions. A value of 0
indicates infinite repetitions. The once
argument takes an
expression that is evaluated only once. It is useful for example for
creating local variables and setting up the job that will be done by
the loop
code. The latter is an expression that will be run at each
timer expiry.
Timers are useful for a large variety of tasks: data distribution and backup, data transformation, etc. The following example calculates mean-minutes and stores them into a time-series available for querying.
Built-in functions
This source file has a list of the built-in functions together with their signatures and the allowable parameter types. For most of these functions, the functionality and parameters are the same as in R.
Environments and assignments
Environments
ztsdb has a notion of environment, but they are not a first class type like in R. Another difference is that ztsdb has dynamic scoping see Scoping.
The environment hierachy is the following: "base" <- "global" <- ... <- "current"
"base" is the parent environment and contains all the built-in functions
"global" is the parent of all interpretation context environments, and the place where global variables (e.g. time-series that should be shared between different interpretation contexts should be defined
the ellipsis stands for a set of 0..n environments
the environment after "global" is always the interpretation context environment
Managing environment content
The functions that help manage an environment's content R. They are
namely assign
, get
, ls
and rm
(and its synonym remove
) and
they work roughly like in R. For convenience the function lsg
is
provided and is the same as ls
except that name is by default
initialized to "global" and so it lists by default the content of the
global environment.
The signatures are:
assign(x, value, envir="current", inherits=FALSE)
get(x, envir="current", inherits=FALSE)
ls(name="current")
lsg(name="global")
rm(..., list=character(), envir="current", inherits=FALSE)
Assignments
Simple assign
The simple assign operator (<-
) always assigns to a variable in the
current environment. This means that a variable created by the simple
assign will never be visible to another interpretation context. It a
variable is created in a function then the variable will be local to
this function.
Special assign
The special assign operator (<<-
) works like in R. It looks in the
current environment for the variable, and if not found it examines up
to the parent environment and so on. If it doesn't find a pre-existing
variable and it gets to the "global" environment, then it creates a
new variable there and makes the assignment.
Caution must be exercised when using the special assign to declare
global variables, because it might result in an assignment unwittingly
occuring in a child of the "global" environment. A safer way to
achieve global assignment is to use the assign
function and
specifying the parameter envir
as "global":
a <<- 123 # dangerous if 'a' is already defined in a child environment
assign("a", 123, envir="global") # safe
Errors
Compared to R, ztsdb has a simplified mechanism for handling errors
and there is no concept of warning. Errors can only be captured via
.Last.error
.
ztsdb implements, like in R, a try/catch mechanism which can be used like this:
## the following returns "not valid"
error_string <- "invalid type for binary operator (double + string)"
a <- tryCatch(1 + "a", if (.Last.error==error_string) -1)
a # 'a' has value -1
Durability
Permanent objects are arrays and zts
that are declared with the
file
parameter (see Arrays). The
objects are then memory-mapped to a set of files in the directory
indicated by file
. To allow a deterministic file state, the function
msync
is provided. It has the signature:
function(x, async=FALSE)
The async
parameter determines if the operation is asynchronous or
not.