Standard error of the mean of correlated series
The standard error of the mean of a series with positive autocorrelation is larger than for an uncorrelated series, because the variance includes positive covariance terms in addition to the variance of each record.
Effective number of observations
For uncorrelated series, the variance of the mean decreases by the number of records compared to the variance of a single record.
\[Var(\bar{x}) = {Var(x) \over n}\]
For correlated series, the effective number of observations is defined as to give the same scaling.
\[Var(\bar{x}) = {Var(x) \over n_{eff}}\]
There are only $n_{eff}$ number of effective obserservations in the series.
LogNormals.effective_n_cor — Functioneffective_n_cor(x, ms::MissingStrategy=PassMissing())
effective_n_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())Compute the number of effective observations for an autocorrelated series.
Arguments
x: An iterator of a series of observations.ms:MissingStrategy: If not given defaults toPassMissing. Set toExactMissing()to consciouly handle missing value inx.acf: AutocorrelationFunction starting from lag 0. If not given, defaults toautocor(x, ms)
The formula in Zieba has been extended for missing values:
\[n_{eff} = \frac{n_F}{1+{2 \over n_F} \sum_{k=1}^{min(n-1,n_k)} (n-k-m_k) \rho_k}\]
where $n$ is the number of total records, $n_F$ is the number of finite records, $n_k$ is the nummber of components in the used autocorrelation function ($n-1$ if not estimated from the data) ,$\rho_k$ is the correlation, and $m_k$ is the number of pairs that contain a missing value for lag $k$.
Details
Missing values are not handled by default, i.e. the number of effective observations is missing if ther any missings in x. The recommended way is using ExactMissing(). Alternatively, se to SkipMissing() to speed up computation (by internally omitting count_forlags missing pairs) at the cost of a positively biased result with increasing bias with the number of missings. The latter leads to a subsequent underestimated uncertainty of the sum or the mean.
Examples
using Distributions, DistributionVectors, Missings, MissingStrategies, LinearAlgebra
acf0 = [1,0.4,0.1]
Sigma = cormatrix_for_acf(100, acf0);
# 100 random variables each Normal(1,1)
dmn = MvNormal(ones(100), Symmetric(Sigma));
x = allowmissing(rand(dmn));
x[11:20] .= missing
neff = effective_n_cor(x, acf0, ExactMissing())
neff < 90
neff_biased = effective_n_cor(x, acf0, SkipMissing())
neff_biased > neffeffective_n_cor(x, ms::MissingStrategy=PassMissing())
effective_n_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())Compute the number of effective observations for an autocorrelated series.
Arguments
x: An iterator of a series of observations.ms:MissingStrategy: If not given defaults toPassMissing. Set toExactMissing()to consciouly handle missing value inx.acf: AutocorrelationFunction starting from lag 0. If not given, defaults toautocor(x, ms)
The formula in Zieba has been extended for missing values:
\[n_{eff} = \frac{n_F}{1+{2 \over n_F} \sum_{k=1}^{min(n-1,n_k)} (n-k-m_k) \rho_k}\]
where $n$ is the number of total records, $n_F$ is the number of finite records, $n_k$ is the nummber of components in the used autocorrelation function ($n-1$ if not estimated from the data) ,$\rho_k$ is the correlation, and $m_k$ is the number of pairs that contain a missing value for lag $k$.
Details
Missing values are not handled by default, i.e. the number of effective observations is missing if ther any missings in x. The recommended way is using ExactMissing(). Alternatively, se to SkipMissing() to speed up computation (by internally omitting count_forlags missing pairs) at the cost of a positively biased result with increasing bias with the number of missings. The latter leads to a subsequent underestimated uncertainty of the sum or the mean.
Examples
using Distributions, DistributionVectors, Missings, MissingStrategies, LinearAlgebra
acf0 = [1,0.4,0.1]
Sigma = cormatrix_for_acf(100, acf0);
# 100 random variables each Normal(1,1)
dmn = MvNormal(ones(100), Symmetric(Sigma));
x = allowmissing(rand(dmn));
x[11:20] .= missing
neff = effective_n_cor(x, acf0, ExactMissing())
neff < 90
neff_biased = effective_n_cor(x, acf0, SkipMissing())
neff_biased > neffeffective_n_cor(x, ms::MissingStrategy=PassMissing())
effective_n_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())Compute the number of effective observations for an autocorrelated series.
Arguments
x: An iterator of a series of observations.ms:MissingStrategy: If not given defaults toPassMissing. Set toExactMissing()to consciouly handle missing value inx.acf: AutocorrelationFunction starting from lag 0. If not given, defaults toautocor(x, ms)
The formula in Zieba has been extended for missing values:
\[n_{eff} = \frac{n_F}{1+{2 \over n_F} \sum_{k=1}^{min(n-1,n_k)} (n-k-m_k) \rho_k}\]
where $n$ is the number of total records, $n_F$ is the number of finite records, $n_k$ is the nummber of components in the used autocorrelation function ($n-1$ if not estimated from the data) ,$\rho_k$ is the correlation, and $m_k$ is the number of pairs that contain a missing value for lag $k$.
Details
Missing values are not handled by default, i.e. the number of effective observations is missing if ther any missings in x. The recommended way is using ExactMissing(). Alternatively, se to SkipMissing() to speed up computation (by internally omitting count_forlags missing pairs) at the cost of a positively biased result with increasing bias with the number of missings. The latter leads to a subsequent underestimated uncertainty of the sum or the mean.
Examples
using Distributions, DistributionVectors, Missings, MissingStrategies, LinearAlgebra
acf0 = [1,0.4,0.1]
Sigma = cormatrix_for_acf(100, acf0);
# 100 random variables each Normal(1,1)
dmn = MvNormal(ones(100), Symmetric(Sigma));
x = allowmissing(rand(dmn));
x[11:20] .= missing
neff = effective_n_cor(x, acf0, ExactMissing())
neff < 90
neff_biased = effective_n_cor(x, acf0, SkipMissing())
neff_biased > neffStandard error of the mean of a correlated series
LogNormals.sem_cor — Functionsem_cor(x, ms::MissingStrategy=PassMissing())
sem_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())Estimate the standard error of the mean of an autocorrelated series: $Var(\bar{x}) = {Var(x) \over n_{eff}}$.
Arguments
x: An iterator of a series of observationsacf: AutocorrelationFunction starting from lag 0.ms:MissingStrategypassed toeffective_n_cor. Value ofSkipMissing()speeds up computation compared toExactMissing(), but leads to a negatively biased result with absolute value of the bias increasing with the number of missings.
Optional Arguments
neff: may provide a precomputed number of observations for efficiency.
sem_cor(x, ms::MissingStrategy=PassMissing())
sem_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())Estimate the standard error of the mean of an autocorrelated series: $Var(\bar{x}) = {Var(x) \over n_{eff}}$.
Arguments
x: An iterator of a series of observationsacf: AutocorrelationFunction starting from lag 0.ms:MissingStrategypassed toeffective_n_cor. Value ofSkipMissing()speeds up computation compared toExactMissing(), but leads to a negatively biased result with absolute value of the bias increasing with the number of missings.
Optional Arguments
neff: may provide a precomputed number of observations for efficiency.
The default estiamtes the empirical autocorrelation from the given series. If possible, use a more precise estimate from longer series. For example when computing the daily means of an hourly time series, estimate the empirical autocorrelation from monthly or annual series and provide it to the daily applications of sem_cor using argument acfe.
Variance of a correlated series
LogNormals.var_cor — Functionvar_cor(x, ms::MissingStrategy=PassMissing())
var_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())Estimate the variance for an autocorrelated series.
Zieba 2011 provide the following formula:
\[Var(x) = \frac{n_{eff}}{n (n_{eff}-1)} \sum \left( x_i - \bar{x} \right)^2 = {(n-1) n_{eff} \over n (n_{eff}-1)} Var_{uncor}(x)\]
Arguments
x: An iterator of a series of observationsacf: AutocorrelationFunction starting from lag 0.ms:MissingStrategypassed toeffective_n_cor. Value ofSkipMissing()speeds up computation compared toExactMissing(), but leads to a negatively biased result with absolute value of the bias increasing with the number of missings.
Optional Arguments
neff: may provide a precomputed number of observations for efficiency.
var_cor(x, ms::MissingStrategy=PassMissing())
var_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())Estimate the variance for an autocorrelated series.
Zieba 2011 provide the following formula:
\[Var(x) = \frac{n_{eff}}{n (n_{eff}-1)} \sum \left( x_i - \bar{x} \right)^2 = {(n-1) n_{eff} \over n (n_{eff}-1)} Var_{uncor}(x)\]
Arguments
x: An iterator of a series of observationsacf: AutocorrelationFunction starting from lag 0.ms:MissingStrategypassed toeffective_n_cor. Value ofSkipMissing()speeds up computation compared toExactMissing(), but leads to a negatively biased result with absolute value of the bias increasing with the number of missings.
Optional Arguments
neff: may provide a precomputed number of observations for efficiency.
Effective autocorrelation function
LogNormals.autocor_effective — Functionautocor_effective(x, ms::MissingStrategy=PassMissing())
autocor_effective(x, acf)Estimate the effective autocorrelation function for series x.
Arguments
x: An iterator of a series of observationsms:MissingStrategypassed toautocoracf: AutocorrelationFunction starting from lag 0
Notes
- The effect autocorrelation function are the first coefficients of the autocorrelation function up to before the first negative coefficient.
- According to Zieba 2011 using this effective version rather the full version when estimating the autocorrelationfunction from the data yields better result for the standard error of the mean (
sem_cor). - Optional argument
acfallows the caller to provide a precomputed estimate of autocorrelation function (seeautocor).
autocor_effective(x, ms::MissingStrategy=PassMissing())
autocor_effective(x, acf)Estimate the effective autocorrelation function for series x.
Arguments
x: An iterator of a series of observationsms:MissingStrategypassed toautocoracf: AutocorrelationFunction starting from lag 0
Notes
- The effect autocorrelation function are the first coefficients of the autocorrelation function up to before the first negative coefficient.
- According to Zieba 2011 using this effective version rather the full version when estimating the autocorrelationfunction from the data yields better result for the standard error of the mean (
sem_cor). - Optional argument
acfallows the caller to provide a precomputed estimate of autocorrelation function (seeautocor).
Autocorrelation of a series with missing values
StatsBase.autocor — Functionautocor(x::AbstractVector{x::<:Union{Missing,Real}, ms::MissingStrategy=PassMissing();
dmean::Bool=true}
autocor(x::AbstractVector{x::<:Union{Missing,Real}, lags, ms::MissingStrategy=PassMissing();
dmean::Bool=true}
autocor(x::AbstractMatrix{x::<:Union{Missing,Real}, ms::MissingStrategy=PassMissing();
dmean::Bool=true}
autocor(x::AbstractMatrix{x::<:Union{Missing,Real}, lags, ms::MissingStrategy=PassMissing();
dmean::Bool=true}Estimate the autocorrelation function accounting for missing values.
Arguments
x: series or matrix with series in columns, which may contain missing valueslags: Integer vector of the lags for which correlation should be computedms:MissingStrategy. Defaults toPassMissing. Set toExactMissing()to divide the sum in the formula of the exepected value in the formula for the correlation at lagkbyn - nmissinginstead ofn, wherenimissingis the number of records where there is a missing either in the original vector or its lagged version (seecount_forlags).deman: iffalse, assumemean(x)==0.
If the missing strategy is set to SkipMissing() then the computation is faster, but it is more strongly biased low with increasing number of missings. Note that StatsBase.autocor uses devision by n instead of 'n-k', the true length of the vectors correlated at lag k resulting in low-biased correlations of higher lags for numerical stability reasons.
LogNormals.count_forlags — Functioncount_forlags(pred, x, lags)
count_forlag(pred, x, k::Integer)Count the number of pairs for lag k which fulfil a predicate.
Arguments
pred::Function(x_i,x_iplusk)::Bool: The predicate to be applied to each pairx: The series whose lags are inspected.lags: An iterator of Integer lag sizesk: A single lag.
Common case is to compute the number of missings for the autocorrelation: with predicate missinginpair(x,y) = ismissing(x) || ismissing(y).