To begin, we’ll load foqat
and show three datasets in foqat
:aqi
is a dataset about time series of air quality with 1-second resolution.voc
is a dataset about time series of volatile organic compounds with 1-hour resolution.met
is a dataset about time series of meterological conditions with 1-hour resolution.
library(foqat)
head(aqi)
#> Time NO NO2 CO SO2 O3
#> 1 2017-05-01 01:00:00 0.0376578 2.79326 0.256900 NA 56.5088
#> 2 2017-05-01 01:01:00 0.0341483 2.76094 0.254692 NA 57.0546
#> 3 2017-05-01 01:02:00 0.0310285 2.65239 0.265178 NA 57.6654
#> 4 2017-05-01 01:03:00 0.0357016 2.60257 0.269691 NA 58.7863
#> 5 2017-05-01 01:04:00 0.0337507 2.59527 0.273395 NA 59.0342
#> 6 2017-05-01 01:05:00 0.0238120 2.57260 0.276464 NA 59.2240
head(voc)
#> Time Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1 2020-05-01 00:00:00 0.233 0.1750 0.544 0.020 0.1020
#> 2 2020-05-01 01:00:00 0.376 0.2025 0.704 0.028 0.1045
#> 3 2020-05-01 02:00:00 0.519 0.2300 0.864 0.036 0.1070
#> 4 2020-05-01 03:00:00 0.805 0.2850 1.184 0.052 0.1120
#> 5 2020-05-01 04:00:00 0.658 0.2920 1.304 0.075 0.1230
#> 6 2020-05-01 05:00:00 0.538 0.3700 0.904 0.049 0.1110
head(met)
#> Time TEM HUM WS WD
#> 1 2017-05-01 00:00:00 21.4 87.0 3.0 39
#> 2 2017-05-01 00:05:00 21.2 86.7 3.6 68
#> 3 2017-05-01 00:10:00 21.0 86.3 3.5 76
#> 4 2017-05-01 00:15:00 20.9 85.8 3.4 73
#> 5 2017-05-01 00:20:00 20.8 86.0 2.8 68
#> 6 2017-05-01 00:25:00 20.8 86.0 2.3 68
Summary time series
The statdf()
allows you to statistics time series:
statdf(aqi)
#> mean sd min 25% 50% 75% max integrity
#> NO 0.33 0.61 -0.08 0.08 0.13 0.37 19.02 0.765
#> NO2 3.06 2.68 -0.15 1.07 2.21 4.13 20.53 0.786
#> CO 0.30 0.09 0.17 0.25 0.27 0.34 0.73 0.709
#> SO2 1.80 2.76 -0.15 0.25 0.97 2.11 34.08 0.734
#> O3 52.86 19.53 7.95 38.89 49.50 64.38 106.61 0.783
Resample time series
We can resample time series by using trs()
, which will give you a new time series with new resolution and complete timestamps.
You can use bkip
to set a new time resolution.
The time series can be clipped by using st
(start time) and et
(end time).
The default function of resampling is mean
. The wind data is acceptable by setting wind
to TRUE
and specifying coliws
(the column index of the wind speed) and coliwd
(the column index of the wind speed).
new_met=trs(met, bkip = "1 hour", st = "2017-05-01 01:00:00", wind = TRUE, coliws = 4, coliwd = 5)
#> Joining, by = "temp_datetime"
head(new_met)
#> Time TEM HUM WS WD
#> 1 2017-05-01 01:00:00 21.18333 83.15833 4.555427 72.52891
#> 2 2017-05-01 02:00:00 21.54167 77.62500 4.238292 72.02753
#> 3 2017-05-01 03:00:00 20.71667 80.22500 5.287611 82.34847
#> 4 2017-05-01 04:00:00 20.52500 79.80000 5.653918 89.15400
#> 5 2017-05-01 05:00:00 21.12500 61.41667 7.417430 98.62400
#> 6 2017-05-01 06:00:00 21.30000 51.44167 8.401939 89.26818
You can also change the default function of resampling to sum
, median
, min
, max
, sd
, quantile
. If you choose quantile
, you will also need to fill probs
(e.g., 0.5).
Calculate the variation of time series
svri()
helps you compute the variation of time series (e.g. calculate the max value of all values grouped by hours of day).
The parameters of bkip
, st
, et
, fun
is same as trs
. The wind data is acceptable just like trs()
.
mode
allows you to choose modes of calculation, value
is the sub parameter of mode
.There have three modes: recipes
, ncycle
, custom
which will be introduced below:
mode = recipes
recipes
stands for built-in solutions.
The mode recipes
corresponds to three values
: day
, week
, month
. day
means the time series will group by hours from 0 to 23.week
means the time series will group by hours from 1 to 7.month
means the time series will group by hours from 1 to 31. Below is an example which calculate the median values for time series group by hour (e.g., 0:00, 1:00 …).
new_voc=svri(voc, bkip="1 hour", mode="recipes", value="day", fun="median")
#> Joining, by = "temp_datetime"
head(new_voc)
#> hour of day Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1 0 0.461 0.3555 0.583 0.051 0.1020
#> 2 1 0.581 0.3710 0.704 0.048 0.1045
#> 3 2 0.583 0.4020 0.864 0.041 0.1120
#> 4 3 0.805 0.4530 1.184 0.052 0.1220
#> 5 4 0.658 0.4180 1.304 0.075 0.1230
#> 6 5 0.572 0.5620 0.923 0.049 0.1210
mode = ncycle
ncycle
stands for grouping time series by the order number of each row in each cycle.
Below is an example which calculate the median values for time series group by hour (e.g., 0:00, 1:00 …).
new_voc=svri(voc, bkip="1 hour", st="2020-05-01 00:00:00", mode="ncycle", value=24, fun="median")
#> Joining, by = "temp_datetime"
head(new_voc)
#> cycle Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1 0 0.461 0.3555 0.583 0.051 0.1020
#> 2 1 0.581 0.3710 0.704 0.048 0.1045
#> 3 2 0.583 0.4020 0.864 0.041 0.1120
#> 4 3 0.805 0.4530 1.184 0.052 0.1220
#> 5 4 0.658 0.4180 1.304 0.075 0.1230
#> 6 5 0.572 0.5620 0.923 0.049 0.1210
mode = custom
custom
stands for grouping time series by a reference column in time serires. If you select mode = custom
, value
stands for the column index of the reference column. Below is an example which calculate the median values for time series group by hour (e.g., 0:00, 1:00 …).
#add a new column stands for hour.
voc$hour=lubridate::hour(voc$Time)
#calculate according to the index of reference column.
new_voc=svri(voc, bkip = "1 hour", mode="custom", value=7, fun="median")
head(new_voc[,-2])
#> custom cycle Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1 0 0.461 0.3555 0.583 0.051 0.1020
#> 2 1 0.581 0.3710 0.704 0.048 0.1045
#> 3 2 0.583 0.4020 0.864 0.041 0.1120
#> 4 3 0.805 0.4530 1.184 0.052 0.1220
#> 5 4 0.658 0.4180 1.304 0.075 0.1230
#> 6 5 0.572 0.5620 0.923 0.049 0.1210
#rmove voc
rm(voc)
Calculate average of variation
avri()
is a customized version of svri()
which helps you to calculate the average variation (with standard deviation) of time series.
The output is a data frame which contains both the average variations and the standard deviations. An example is a time series of 3 species. The second to the fourth column are the average variations, and the fifth to the seventh column are the standard deviations.
new_voc=avri(voc, bkip = "1 hour", st = "2020-05-01 01:00:00")
#> Joining, by = "temp_datetime"
head(new_voc)
#> hour of day Propylene_ave Acetylene_ave n.Butane_ave trans.2.Butene_ave
#> 1 0 0.735375 0.48525 1.1655 0.0695
#> 2 1 0.737650 0.39920 1.0683 0.0575
#> 3 2 0.831800 0.37320 1.1748 0.0534
#> 4 3 1.420300 0.38060 2.2370 0.0910
#> 5 4 1.051800 0.42100 1.8614 0.0664
#> 6 5 1.133200 0.59140 1.8872 0.0604
#> Cyclohexane_ave Propylene_sd Acetylene_sd n.Butane_sd trans.2.Butene_sd
#> 1 0.0910 0.5876756 0.2137766 1.1091851 0.04648835
#> 2 0.1034 0.5677007 0.2099156 0.9505957 0.03992493
#> 3 0.1098 0.6452141 0.1634861 0.8482224 0.02864961
#> 4 0.1210 1.5527906 0.1721926 2.1616755 0.06951619
#> 5 0.1392 0.8127953 0.2090957 1.4807749 0.03008820
#> 6 0.1652 0.8916562 0.2569549 1.6192165 0.02475480
#> Cyclohexane_sd
#> 1 0.02184414
#> 2 0.02442181
#> 3 0.02115892
#> 4 0.02736786
#> 5 0.06293012
#> 6 0.10819057
Convert time series into proportion time series
prop()
helps you convert time series into proportion time series (e.g., convert a time series of concentrations of species into a time series of contributions of species).
prop_voc=prop(voc)
head(prop_voc)
#> Time Propylene Acetylene n.Butane trans.2.Butene Cyclohexane
#> 1 2020-05-01 00:00:00 0.2169460 0.1629423 0.5065177 0.01862197 0.09497207
#> 2 2020-05-01 01:00:00 0.2657244 0.1431095 0.4975265 0.01978799 0.07385159
#> 3 2020-05-01 02:00:00 0.2955581 0.1309795 0.4920273 0.02050114 0.06093394
#> 4 2020-05-01 03:00:00 0.3301887 0.1168991 0.4856440 0.02132896 0.04593929
#> 5 2020-05-01 04:00:00 0.2683524 0.1190865 0.5318108 0.03058728 0.05016313
#> 6 2020-05-01 05:00:00 0.2728195 0.1876268 0.4584178 0.02484787 0.05628803
Analysis of linear regression for time series in batch
anylm()
allows you to analyze linear regression for time series in batch.xd
are the index of columns you want to put in x axis (independent variables).yd
are the index of columns you want to put in y axis (dependent variables).zd
are the index of columns you want to put as color scales. td
are the index of columns you want to use as a basis for grouping.
A simple example is demonstrated below to illustrate the functionality.
This example explores the correlation of the built-in dataset aqi. Grouped by day, it explores the correlation of O3 with NO and NO2 for each day. and explores the effect of CO on correlations using CO as the fill color.
df=data.frame(aqi,day=day(lubridate::aqi$Time))
lr_result=anylm(df, xd=c(2,3), yd=6, zd=4, td=7,dign=3)
View(lr_result)