# Types

- Array types
*double*(64-bit floating point)*logical*(boolean)*character*(string)*time*(date/time)*interval*(date/time interval)*duration*(duration of time in nanoseconds)*period*(a duration that can handle the variable durations of day, month and year)

- Composite types
*zts*(time-series)*list*

- Special types
*connection*(connection to another ztsdb instance)*builtin*(buit-in function)*function*(user-defined function)*error*

## Arrays

The various constructors functions (*vector*, *matrix* and *array*)
all yield the same array type, respectively 1-dimensional,
2-dimensional and >=3-dimensional arrays. In particular a scalar is a
1-dimensional array of size 1. The following types are arrays (note in
particular the absence of an integer type):

*double**logical**character**time**interval**duration**period*

The constructors are similar to R:

```
1 # vector of size 1
vector(mode="double", 3)
c(1, 2, 3, 4, 5)
### [1] 1 2 3 4 5
c(TRUE, FALSE, FALSE, TRUE)
### [1] TRUE FALSE FALSE TRUE
matrix(1:9, 3, 3, dimnames=list(c("i","ii","iii"), c("one","two","three")))
array(1:8, c(2,2,2), dimnames=list(NULL,NULL,c("one","two")))
```

One significant difference with R is that arrays (as well as
time-series) can be persistent. This is controlled by the *file*
argument which indicates the name of the directory that will be
created in order to hold the memory mapped files associated with the
array:

```
a <- array(1:2e6, c(1e6,2), file="/tmp/memory_mapped_array_directory")
```

The *str* function displays the memory-mapped directory if any:

```
str(a)
### displays:
### double - ord [1:1000000, 1:2] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 ...
### - mmap file = /tmp/memory_mapped_array_directory
```

Another feature of arrays is that ztsdb maintains the knowledge of
the ordering of each column. *str* will indicate that all columns are
ordered by printing out "ord". The function *is.ordered* can also be
used to determine this programmaticaly.

```
is.ordered(1:10) # TRUE
is.ordered(10:1) # FALSE
str(1:10)
### double - ord [1:10] 1 2 3 4 5 6 7 8 9 10
### - malloc-based
str(10:1)
### double [1:10] 10 9 8 7 6 5 4 3 2 1
### - malloc-based
```

## Date/time representation

ztsdb has four built-in date and time related types. They are
*time*, *interval*, *duration* and *period*.

### time

A *time* is a specific point in time with nanosecond precision:

```
timepoint1 <- |.2009-01-01 13:12:00.000000001 America/New_York.|
timepoint2 <- as.time("2009-01-01 13:12:00.000000001 America/New_York")
```

It is encoded as the nanosecond offset since 1970-01-01 UTC. This
means the *time* range is approximately from year 1386 to year
2554. It does not have an associated time zone, but can be displayed
in any desired time zone with the *print* function. It follows the
POSIX convention, in particular it does not have the notion of leap
seconds.

### interval

An *interval* is represented by two points in time and the start or
the end of the interval can be either closed or open. In the string
constructor, a closed start(end) is indicated with the '+' sign and
indicates that the start(end) is included in the interval. An open
start(end) is indicated with the '-' sign and means it is not included
in the interval. For the constructor version that takes times, two
additional arguments may be specified, *sopen* and *eopen* which, when
true indicate that, respectively, the start or end of the interval is
open. By default an interval has a closed start and an open end.

```
ival <- |+2009-01-01 13:12:00 America/New_York -> 2009-02-01 15:11:03 America/New_York-|
as.interval("-2009-01-01 13:12:00 America/New_York -> 2009-02-01 15:11:03 America/New_York+")
start <- |.2009-01-01 13:12:00 America/New_York.|
one_hour <- as.duration("01:00:00")
end <- start + one_hour
### all the following produce the same interval:
interval(start, end) # by default sopen=T,eopen=F
interval(start, end, sopen=T, eopen=F)
interval(start, duration=one_hour)
```

It is encoded as two *time* values with additional flags indicating if
the beginning and end of the intervals are opened or closed. Accessors
are defined in order to access its components:

```
ival <- |+2009-01-01 13:12:00 America/New_York -> 2009-02-01 15:11:03 America/New_York-|
interval.start(ival) # |.2009-01-01 13:12:00 America/New_York.|
interval.end(ival) # |.2009-02-01 15:11:03 America/New_York.|
interval.sopen(ival) # FALSE
interval.eopen(ival) # TRUE
```

### duration

A *duration* is a count of nanoseconds, which may be negative:

```
one_second <- as.duration(1e9)
one_second <- as.duration("00:00:01")
one_hour <- as.duration("01:00:00")
one_nanosecond <- as.duration("00:00:00.000_000_001")
```

### period

*period* represents the calendar or "business" view of a duration with
the concepts of month and day. The exact duration of a period is
unknown until it is anchored to a point in time and associated with a
time-zone.

*period* is composed of two parts a month/days part and a
duration. These two components may have opposite signs.

Note that for convenience reasons the constructor syntax allows specifying years and weeks, but they are converted to their representation in months/days.

```
### constructor from string:
one_month_one_day <- as.period("1m1d")
one_day_minus_12_hours <- as.period("1d/-12:00:00")
one_of_everything <- as.period("1y1m1w1d/01:01:01.000_000_001")
### constructor from double and duration arguments:
one_month_one_day <- period(months=1, days=1)
one_Day_minus_12_hours <- period(days=1, duration=as.duration(01:01:01.000_000_001")
```

Accessor for *period*'s components are provided:

```
ones <- as.period("1y1m1w1d/01:01:01.000_000_001")
period.month(ones)
period.day(ones)
period.duration(ones)
```

# Operations

## Arithmetic operations

Arithmetic operations are straighforward. Like in R one can use the infix notation of the functional notation be enclosing the operator in backquotes.

```
1 + 1 == `+`(1, 1) # TRUE
```

## Arithmetic operations on temporal types

A *duration* can be added, subtracted, multiplied or divided, and the result is a *duration*.

```
one_second <- as.duration("00:00:01")
one_second + one_second
### [1] 00:00:02
three_seconds = 3 * one_second
three_seconds / 3
### [1] 00:00:01
```

A *duration* can be added or subtracted to a *time* or to an
*interval*, and the result remains respectively a *time* or an
*interval*.

```
one_second <- as.duration("00:00:01")
timepoint <- |.2009-01-01 13:12:00.000000001 America/New_York.|
timepoint + one_second
### [1] 2009-01-01 13:12:01.000000001 EST
```

A *period* can be added or subtracted to a *time* or to an
*interval*, and the result remains respectively a *time* or an
*interval*.

The functional way of specifying the operators (`+` and `-`) is
needed for the addition/subtraction of a *period* because an
additional time-zone argument must be specified. For example:

```
one_month <- as.period("1m")
`+`(|.2009-01-01 13:12:00 America/New_York.|, one_month, "America/New_York")
### [1] 2009-02-01 13:12:00 EST
```

## Subsetting and subassignment

Subsetting and subassignment work as in R. An index can be a *logical*
vector or a *double* vector or a *character* vector. Additionally a
*time* vector can be subsetted by either a *time* or an *interval*
vector, and an *interval* vector can be subsetted by an *interval*
vector (see also Date/time intersection).

```
a[1:10, 1]
a[1:10, c(TRUE,FALSE)]
a[,]
a[1:10, ] <- a[1:20]
```

## Set operations

The set operations *intersect*, *union* and *setdiff* are defined for
both *dtime* and *interval* vectors. If these vectors are not ordered,
they will be sorted before the operation is carried out. This means
that for unordered vectors, there is a performance penalty for these
operations that must be taken into consideration.

Additionally, each of these set operations has a counterpart function
*intersect.idx*, *union.idx* and *setdiff.idx* that, instead of
computing a new set, returns the index of the set.

### Intersection

```
time[interval]
interval[time]
interval[interval]
ts[interval]
ts[dtime]
```

or alternatively:

```
intersect(time, interval)
intersect(interval, interval)
```

### Union

```
union(interval, interval) # gives back the minimal interval set
union(time, time)
```

### Difference

```
setdiff(time, time)
setdiff(interval, interval)
setdiff(time, interval)
```

## Calendar operations

Rounding is defined for *time* and *interval*, with the following set
of constants: "second", "minute", "hour", "day", "week", "month",
"quarter", "year". For all constants that require the computation to
take into account daylight saving time, the time zone argument *tz*
is required. For example:

```
round(time, "day", tz="Europe/London")
round(time, "minute")
round(interval, "month", tz="America/New_York")
round(interval, "second")
```

Conversion of calendar periods to an integer value is defined for *time* objects:

```
dayweek(Sys.time(), "America/New_York") # 0 to 6 (0 is Sunday)
daymonth(Sys.time(), "America/New_York") # 1 to 31
dayyear(Sys.time(), "America/New_York") # 1 to 366
month(Sys.time(), "America/New_York") # 1 to 12
year(Sys.time(), "America/New_York")
```

Distance in calendar periods between two dates. Gives back a *double*
which indicates the number units of the chosen calendar period:

```
dist(|.2009-01-01 13:12:00 America/New_York.|, |.2009-01-01 13:12:00 America/New_York.|, "day", tz)
```

## Generating sequences

ztsdb proposes a function similar to R *seq* function. Temporal
sequences (either *time* or *interval*) can be created with a *by*
argument that can be either a *duration* or a *period*. In the case
where it is a period, the *tz* argument must be specified in order to
associate a time-zone to the operations.

```
one_day <- as.period("1d")
seq(from=|.2009-01-01 13:12:00 America/New_York.|,
to= |.2016-01-01 13:12:00 America/New_York.|,
by=one_day, tz="America/New_York")
one_second <- as.duration("00:00:01")
seq(from=|.2009-01-01 13:12:00 America/New_York.|,
to= |.2009-01-02 13:12:00 America/New_York.|,
by=one_second)
seq(from=|+2009-01-01 13:00:00 America/New_York -> 2009-01-01 15:00:00 America/New_York-|,
to= |+2010-01-01 13:00:00 America/New_York -> 2010-01-01 15:00:00 America/New_York-|,
by=one_day, tz="America/New_York")
```

## CSV read/write

Like in R the function *read.csv* and *write.csv* are provided. ztsdb
does not adhere strictly to RFC 4180. In particular we use (and
expect) *CR* and not *CRLF*. And although we allow quoted elements, we
don't allow the separator to appear in a string. We believe these
functions to be mostly useful for time-series where this limitation
has little impact.

## Rolling functions

These functions "roll" over each column calculating a value over a
given window of observations. They can be used either on *double* or
on *zts*. Their signature is:

```
function(x, window, nvalid=window)
```

*x* is the *double* or *zts*, *window* an integer that defines the
number of observation on which the operation will be performed, and
*nbvalid* is the number of non-NaN observations needed to consider a
result valid. For example, a *window* of 10 and a *nbvalid* of 5 means
that if non-NaN 5 or more observations exist in the window, then the
result will be computed; otherwise it will be set to NaN. The
functions are *rollmean*, *rollmin*, *rollmax*, *rollvar*,
*rollcov*. See an example of the usage of *rollcov*
here.

## Array and zts transformation

These functions transform a *double* or a *zts* column-wise.

### locf

Last observation carried forward. A non-NaN observation is carried forwards to fill-in a NaN observation is the non-NaN and NaN observations are in the same window specified by *n*; the signature of this function is:

```
function(x, n)
```

### move

Moves all observations down or up depending on the value *n*. Positive *n* move down while negative *n* move up. A NaN value is assigned for observations which are moved without being filled (at the beginning or at the end of the columns depending on the direction of the move). The signature is:

```
function(x, n)
```

### rotate

Works as the *move* function, but the observations wrap around and so no NaN are produced. The signature is:

```
function(x, n)
```

### rev

Reverses, still column-wise, an *array* or a *list*. The signature is:

```
function(x)
```

## Cummulative functions

These function cumulate values. The *rev* parameter controls in which direction. The functions of this group are: *cumsum*, *cumprod*, *cumdiv*, *cummax*, *cummin*. Their signature is:

```
function(x, rev=FALSE)
```

## Aggregate functions

*sum* and *prod* which provide respectively the sum and the product of
all elements of a vector/matrix/array are provided and work like in R.

# Time-series

*zts*, the time series type, is composed of a *time* vector and a
*double* array. The length of the *time* index is the same as the
first dimension of the array of double. This means that each *time*
element is associated to a "horizontal" slice of the array of
double. This first dimension has the same special time subsetting
capabilities as the *time* type.

## Creation

A time series is created with a *time* vector and a corresponding
*double* (i.e. the length of the first dimension of the array is the
same as the length of the *time* vector). Note that like arrays, time
series can have an arbitrary number of dimensions. And just like
arrays, a time series can be memory-mapped by supplying the optional
argument *file* which indicates where the memory-mapped files will be
written.

```
idx <- c(|.2015-03-09 06:38:01 America/New_York.|,
|.2015-03-09 06:38:02 America/New_York.|,
|.2015-03-09 06:38:03 America/New_York.|)
z <- zts(idx, matrix(1:6, 3, 2, dimnames=list(NULL, c("one", "two"))), file="memory_mapped_dir")
### one two
### 2015-03-09 06:38:01.000000000 EDT 1 4
### 2015-03-09 06:38:02.000000000 EDT 2 5
### 2015-03-09 06:38:03.000000000 EDT 3 6
```

Note the the *dim* argument can be omitted in the case of a
two-dimensional time-series as the size can be calculated with the
lenght of the *vector*.

## Accessors

*zts* is an aggregate type and its components can be accessed with the
following functions:

```
zts.idx(z)
zts.data(z)
```

## Operations on time-series

Subset and subassign operations are defined similarly to *double*, and
the first dimension follows the indexation semantics of a *time*
vector.

```
ivl <- |+2015-03-09 06:38:01 America/New_York -> 2015-03-09 06:38:02 America/New_York+|
z[ivl,]
### one two
### 2015-03-09 06:38:01.000000000 EDT 1 4
### 2015-03-09 06:38:02.000000000 EDT 2 5
```

Arithmetic operations are defined as for *double*:

```
z + z
### one two
### 2015-03-09 06:38:01.000000000 EDT 2 8
### 2015-03-09 06:38:02.000000000 EDT 4 10
### 2015-03-09 06:38:03.000000000 EDT 6 12
```

The bind family of functions is also defined, but note that a *zts*
index must remain strictly sorted (and consequently with unique
values).

## Align operations

The function *align* has the following signature:

```
align(from, to, start=as.duration(0), end=as.duration(0), method="closest", tz=NULL)
```

It aligns the observations of the *zts* `from`

onto the vector of
*time* `to`

, effectively yielding a new time-series that has the
vector `to`

as time index.

The arguments `start`

and `end`

define an interval which will be used
to pick a value out of `from`

. The alignment algorithm is the
following: for each time *t* in `to`

, define the interval *i*
`[t - start; t + end[`

(note that *start* is closed whereas *end* is open,
i.e. *end* is not part of the interval). For each *i* so defined, pick a
value out of `from`

that is computed over the values of `from`

that fall
in that interval.

*start* and *end* can either be a *duration* or a *period*. If one of
the two is a *period* then *tz* needs to be defined in order to give
meaning to the interval.

The argument *method* controls which value will be picked out of
`from`

for a given value of `to`

and can have the values:

- closest: pick the observation that is closest to the time
*t*in`to`

; note that this method will consider*i*to be the closed interval`[t - start; t + end]`

- count: count the number of observation in
`from`

that fall in*i* - min: pick the obervation with the smallest value
- max: pick the obervation with the largest value
- mean: compute the observations
- median: compute the median of the observations

Here is a visualization of `align(t1, t2, -one_hour, "closest")`

:

Here is a visualization of `align(t1, t2, -one_hour, "count")`

:

```
### create a zts for the example:
one_second <- as.duration("00:00:01")
idx <- seq(|.2015-01-01 12:00:00 America/New_York.|,
|.2015-02-01 12:00:00 America/New_York.|,
by=one_second)
z <- zts(idx, data=0:(length(idx)-1))
### create a vector of time onto which z will be aligned:
to <- c(seq(|.2015-01-01 12:00:00 America/New_York.|,
|.2015-02-01 00:00:00 America/New_York.|,
by=one_hour),
align(z, to, -one_hour, method="count") # the values of this zts will be 3600
align(z, to, -one_hour, method="closest") # the values of this zts will be 0, 3600, 7200, ...
```

Additionally, the function *align.idx* is provided and has the signature:

```
align.idx(from, to, start=as.duration(0), end=as.duration(0), tz=NULL)
```

This function makes a "closest" align and instead of returning a
time-series, it returns the index of the values in `from`

.

## op.zts operation

*op.zts* performs arithmetic operations between two time series and
has the following signature, where _*x* and *y* are time-series and
*op* is a string.

```
op.zts(x, y, op)
```

Each entry in the left time-series operand defines an interval from
the previous entry, and the value associated with this interval will
be applied to all the observations in the right time-series operand
that fall in the interval. Note that the interval is closed at the
beginning and open and the end. The available values for *op* are
`"*"`

, `"/"`

, `"+"`

, `"-"`

.

Here is a visualization of `op.zts(t1, t2, "*")`

:

# Connecting and querying

## Connection

A *connection* is a handle to a remote ztsdb instance. The underlying
protocol of a connection is TCP. It is created like this:

```
c1 <- connection(host="127.0.0.1", port=19001)
```

A *connection* is created only if the connection was successfully
established with the remote instance.

With a *connection* it is possible to run any code remotely using the
*?* (query) operator:

```
c1 ? 1 # evaluate 1 remotely
c1 ? 1 + 1 # evaluate 1+1 remotely
c1 ? a <<- array(1:27, c(3,3,3)) # create 'a' remotely in the global environment
c1 ? a # get 'a'
c1 ? a[1, 2, 1]
c1 ? a[,1,1]
c1 ? { b <- 2; a * b } # create 'b' in the remote context environment
# and send back 'a * b'
```

## Escape operator

It is also possible to escape code with the *++* operator, so that it
is evaluated locally before being sent remotely as part of the query:

```
la <- 2
c1 ? ++la * a # take 'la' locally, send it over to the remote instance,
# multiply it by the remote 'a' and send result back
c1 ? ++{ lb <- 2; lc <- 3; lb * lc } * a # 6 * 'a' where 6 is evaluated locally
```

More complicated schemes are possible, such as defining remote handles, remote escapes, etc.

## Synchronous and asynchronous queries

A query is immediately dispatched to the remote instance for interpretation. Locally, a future is created as a placeholder for the result of the query. The execution of the code then continues until the value of the future is needed when it is used in an expression (or needs to be returned as the result of a query). This means that it is possible to control if a query is synchronous or asynchronous respectively by using or not using the result of the query.

```
### synchronous:
a <- (c1 ? x) + (c2 ? y) # sync'd by the '+'; the queries go out in parallel to 'c1' and 'c2'
### asynchronous:
{ c1 ? x; c2 ? y; NULL } # the result of 'c1' and 'c2' are never used
```

# Timers

It is possible to repeat the execution of code at interval. A timer creates a new interpretation context, but when a timer is destroyed then the interpretation context is torn down too. To avoid this a timer can of course be declared in global scope.

A timer has the following signature:

```
function(duration, loop, once=NULL, loop_max=0)
```

*loop_max* indicates the number of repetitions. A value of 0
indicates infinite repetitions. The *once* argument takes an
expression that is evaluated only once. It is useful for example for
creating local variables and setting up the job that will be done by
the *loop* code. The latter is an expression that will be run at each
timer expiry.

Timers are useful for a large variety of tasks: data distribution and backup, data transformation, etc. The following example calculates mean-minutes and stores them into a time-series available for querying.

# Built-in functions

This source file has a list of the built-in functions together with their signatures and the allowable parameter types. For most of these functions, the functionality and parameters are the same as in R.

# Environments and assignments

## Environments

ztsdb has a notion of environment, but they are not a first class type like in R. Another difference is that ztsdb has dynamic scoping see Scoping.

The environment hierachy is the following: "base" <- "global" <- ... <- "current"

"base" is the parent environment and contains all the built-in functions

"global" is the parent of all interpretation context environments, and the place where global variables (e.g. time series that should be shared between different interpretation contexts should be defined

the ellipsis stands for a set of 0..n environments

the environment after "global" is always the interpretation context environment

## Managing environment content

The functions that help manage an environment's content R. They are
namely *assign*, *get*, *ls* and *rm* (and its synonym *remove*) and
they work roughly like in R. For convenience the function *lsg* is
provided and is the same as *ls* except that name is by default
initialized to "global" and so it lists by default the content of the
global environment.

The signatures are:

```
assign(x, value, envir="current", inherits=FALSE)
get(x, envir="current", inherits=FALSE)
ls(name="current")
lsg(name="global")
rm(..., list=character(), envir="current", inherits=FALSE)
```

## Assignments

### Simple assign

The simple assign operator (*<-*) always assigns to a variable in the
current environment. This means that a variable created by the simple
assign will never be visible to another interpretation context. It a
variable is created in a function then the variable will be local to
this function.

### Special assign

The special assign operator (*<<-*) works like in R. It looks in the
current environment for the variable, and if not found it examines up
to the parent environment and so on. If it doesn't find a pre-existing
variable and it gets to the "global" environment, then it creates a
new variable there and makes the assignment.

Caution must be exercised when using the special assign to declare
global variables, because it might result in an assignment unwittingly
occuring in a child of the "global" environment. A safer way to
achieve global assignment is to use the *assign* function and
specifying the parameter *envir* as "global":

```
a <<- 123 # dangerous if 'a' is already defined in a child environment
assign("a", 123, envir="global") # safe
```

# Errors

Compared to R, ztsdb has a simplified mechanism for handling errors
and there is no concept of warning. Errors can only be captured via
*.Last.error*.

ztsdb implements, like in R, a try/catch mechanism which can be used like this:

```
## the following returns "not valid"
error_string <- "invalid type for binary operator (double + string)"
a <- tryCatch(1 + "a", if (.Last.error==error_string) -1)
a # 'a' has value -1
```

# Durability

Permanent objects are arrays and *zts* that are declared with the
*file* parameter (see Arrays). The
objects are then memory-mapped to a set of files in the directory
indicated by *file*. To allow a deterministic file state, the function
*msync* is provided. It has the signature:

```
function(x, async=FALSE)
```

The *async* parameter determines if the operation is asynchronous or
not.