Standard error of the mean of correlated series

The standard error of the mean of a series with positive autocorrelation is larger than for an uncorrelated series, because the variance includes positive covariance terms in addition to the variance of each record.

Effective number of observations

For uncorrelated series, the variance of the mean decreases by the number of records compared to the variance of a single record.

\[Var(\bar{x}) = {Var(x) \over n}\]

For correlated series, the effective number of observations is defined as to give the same scaling.

\[Var(\bar{x}) = {Var(x) \over n_{eff}}\]

There are only $n_{eff}$ number of effective obserservations in the series.

LogNormals.effective_n_cor — Function

effective_n_cor(x, ms::MissingStrategy=PassMissing()) 
effective_n_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Compute the number of effective observations for an autocorrelated series.

Arguments

x: An iterator of a series of observations.
ms: MissingStrategy: If not given defaults to PassMissing. Set to ExactMissing() to consciouly handle missing value in x.
acf: AutocorrelationFunction starting from lag 0. If not given, defaults to autocor(x, ms)

The formula in Zieba has been extended for missing values:

\[n_{eff} = \frac{n_F}{1+{2 \over n_F} \sum_{k=1}^{min(n-1,n_k)} (n-k-m_k) \rho_k}\]

where $n$ is the number of total records, $n_F$ is the number of finite records, $n_k$ is the nummber of components in the used autocorrelation function ($n-1$ if not estimated from the data) ,$\rho_k$ is the correlation, and $m_k$ is the number of pairs that contain a missing value for lag $k$.

Details

Missing values are not handled by default, i.e. the number of effective observations is missing if ther any missings in x. The recommended way is using ExactMissing(). Alternatively, se to SkipMissing() to speed up computation (by internally omitting count_forlags missing pairs) at the cost of a positively biased result with increasing bias with the number of missings. The latter leads to a subsequent underestimated uncertainty of the sum or the mean.

Examples

using Distributions, DistributionVectors, Missings, MissingStrategies, LinearAlgebra
acf0 = [1,0.4,0.1]
Sigma = cormatrix_for_acf(100, acf0);
# 100 random variables each Normal(1,1)
dmn = MvNormal(ones(100), Symmetric(Sigma));
x = allowmissing(rand(dmn));    
x[11:20] .= missing
neff = effective_n_cor(x, acf0, ExactMissing())
neff < 90
neff_biased = effective_n_cor(x, acf0, SkipMissing())
neff_biased > neff

source

effective_n_cor(x, ms::MissingStrategy=PassMissing()) 
effective_n_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Compute the number of effective observations for an autocorrelated series.

Arguments

x: An iterator of a series of observations.
ms: MissingStrategy: If not given defaults to PassMissing. Set to ExactMissing() to consciouly handle missing value in x.
acf: AutocorrelationFunction starting from lag 0. If not given, defaults to autocor(x, ms)

The formula in Zieba has been extended for missing values:

\[n_{eff} = \frac{n_F}{1+{2 \over n_F} \sum_{k=1}^{min(n-1,n_k)} (n-k-m_k) \rho_k}\]

Details

Examples

using Distributions, DistributionVectors, Missings, MissingStrategies, LinearAlgebra
acf0 = [1,0.4,0.1]
Sigma = cormatrix_for_acf(100, acf0);
# 100 random variables each Normal(1,1)
dmn = MvNormal(ones(100), Symmetric(Sigma));
x = allowmissing(rand(dmn));    
x[11:20] .= missing
neff = effective_n_cor(x, acf0, ExactMissing())
neff < 90
neff_biased = effective_n_cor(x, acf0, SkipMissing())
neff_biased > neff

source

effective_n_cor(x, ms::MissingStrategy=PassMissing()) 
effective_n_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Compute the number of effective observations for an autocorrelated series.

Arguments

x: An iterator of a series of observations.
ms: MissingStrategy: If not given defaults to PassMissing. Set to ExactMissing() to consciouly handle missing value in x.
acf: AutocorrelationFunction starting from lag 0. If not given, defaults to autocor(x, ms)

The formula in Zieba has been extended for missing values:

\[n_{eff} = \frac{n_F}{1+{2 \over n_F} \sum_{k=1}^{min(n-1,n_k)} (n-k-m_k) \rho_k}\]

Details

Examples

using Distributions, DistributionVectors, Missings, MissingStrategies, LinearAlgebra
acf0 = [1,0.4,0.1]
Sigma = cormatrix_for_acf(100, acf0);
# 100 random variables each Normal(1,1)
dmn = MvNormal(ones(100), Symmetric(Sigma));
x = allowmissing(rand(dmn));    
x[11:20] .= missing
neff = effective_n_cor(x, acf0, ExactMissing())
neff < 90
neff_biased = effective_n_cor(x, acf0, SkipMissing())
neff_biased > neff

source

Standard error of the mean of a correlated series

LogNormals.sem_cor — Function

sem_cor(x, ms::MissingStrategy=PassMissing())
sem_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Estimate the standard error of the mean of an autocorrelated series: $Var(\bar{x}) = {Var(x) \over n_{eff}}$.

Arguments

x: An iterator of a series of observations
acf: AutocorrelationFunction starting from lag 0.
ms: MissingStrategy passed to effective_n_cor. Value of SkipMissing() speeds up computation compared to ExactMissing(), but leads to a negatively biased result with absolute value of the bias increasing with the number of missings.

Optional Arguments

neff: may provide a precomputed number of observations for efficiency.

source

sem_cor(x, ms::MissingStrategy=PassMissing())
sem_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Estimate the standard error of the mean of an autocorrelated series: $Var(\bar{x}) = {Var(x) \over n_{eff}}$.

Arguments

x: An iterator of a series of observations
acf: AutocorrelationFunction starting from lag 0.
ms: MissingStrategy passed to effective_n_cor. Value of SkipMissing() speeds up computation compared to ExactMissing(), but leads to a negatively biased result with absolute value of the bias increasing with the number of missings.

Optional Arguments

neff: may provide a precomputed number of observations for efficiency.

source

The default estiamtes the empirical autocorrelation from the given series. If possible, use a more precise estimate from longer series. For example when computing the daily means of an hourly time series, estimate the empirical autocorrelation from monthly or annual series and provide it to the daily applications of sem_cor using argument acfe.

Variance of a correlated series

LogNormals.var_cor — Function

var_cor(x, ms::MissingStrategy=PassMissing())
var_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Estimate the variance for an autocorrelated series.

Zieba 2011 provide the following formula:

\[Var(x) = \frac{n_{eff}}{n (n_{eff}-1)} \sum \left( x_i - \bar{x} \right)^2 = {(n-1) n_{eff} \over n (n_{eff}-1)} Var_{uncor}(x)\]

Arguments

x: An iterator of a series of observations
acf: AutocorrelationFunction starting from lag 0.
ms: MissingStrategy passed to effective_n_cor. Value of SkipMissing() speeds up computation compared to ExactMissing(), but leads to a negatively biased result with absolute value of the bias increasing with the number of missings.

Optional Arguments

neff: may provide a precomputed number of observations for efficiency.

source

var_cor(x, ms::MissingStrategy=PassMissing())
var_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Estimate the variance for an autocorrelated series.

Zieba 2011 provide the following formula:

\[Var(x) = \frac{n_{eff}}{n (n_{eff}-1)} \sum \left( x_i - \bar{x} \right)^2 = {(n-1) n_{eff} \over n (n_{eff}-1)} Var_{uncor}(x)\]

Arguments

x: An iterator of a series of observations
acf: AutocorrelationFunction starting from lag 0.
ms: MissingStrategy passed to effective_n_cor. Value of SkipMissing() speeds up computation compared to ExactMissing(), but leads to a negatively biased result with absolute value of the bias increasing with the number of missings.

Optional Arguments

neff: may provide a precomputed number of observations for efficiency.

source

Effective autocorrelation function

LogNormals.autocor_effective — Function

autocor_effective(x, ms::MissingStrategy=PassMissing())
autocor_effective(x, acf)

Estimate the effective autocorrelation function for series x.

Arguments

x: An iterator of a series of observations
ms: MissingStrategy passed to autocor
acf: AutocorrelationFunction starting from lag 0

Notes

The effect autocorrelation function are the first coefficients of the autocorrelation function up to before the first negative coefficient.
According to Zieba 2011 using this effective version rather the full version when estimating the autocorrelationfunction from the data yields better result for the standard error of the mean (sem_cor).
Optional argument acf allows the caller to provide a precomputed estimate of autocorrelation function (see autocor).

source

autocor_effective(x, ms::MissingStrategy=PassMissing())
autocor_effective(x, acf)

Estimate the effective autocorrelation function for series x.

Arguments

x: An iterator of a series of observations
ms: MissingStrategy passed to autocor
acf: AutocorrelationFunction starting from lag 0

Notes

The effect autocorrelation function are the first coefficients of the autocorrelation function up to before the first negative coefficient.
According to Zieba 2011 using this effective version rather the full version when estimating the autocorrelationfunction from the data yields better result for the standard error of the mean (sem_cor).
Optional argument acf allows the caller to provide a precomputed estimate of autocorrelation function (see autocor).

source

Autocorrelation of a series with missing values

StatsBase.autocor — Function

autocor(x::AbstractVector{x::<:Union{Missing,Real}, ms::MissingStrategy=PassMissing(); 
    dmean::Bool=true}
autocor(x::AbstractVector{x::<:Union{Missing,Real}, lags, ms::MissingStrategy=PassMissing(); 
    dmean::Bool=true}
autocor(x::AbstractMatrix{x::<:Union{Missing,Real}, ms::MissingStrategy=PassMissing(); 
    dmean::Bool=true}
autocor(x::AbstractMatrix{x::<:Union{Missing,Real}, lags, ms::MissingStrategy=PassMissing(); 
    dmean::Bool=true}

Estimate the autocorrelation function accounting for missing values.

Arguments

x: series or matrix with series in columns, which may contain missing values
lags: Integer vector of the lags for which correlation should be computed
ms: MissingStrategy. Defaults to PassMissing. Set to ExactMissing() to divide the sum in the formula of the exepected value in the formula for the correlation at lag k by n - nmissing instead of n, where nimissing is the number of records where there is a missing either in the original vector or its lagged version (see count_forlags).
deman: if false, assume mean(x)==0.

If the missing strategy is set to SkipMissing() then the computation is faster, but it is more strongly biased low with increasing number of missings. Note that StatsBase.autocor uses devision by n instead of 'n-k', the true length of the vectors correlated at lag k resulting in low-biased correlations of higher lags for numerical stability reasons.

source

LogNormals.count_forlags — Function

count_forlags(pred, x, lags)
count_forlag(pred, x, k::Integer)

Count the number of pairs for lag k which fulfil a predicate.

Arguments

pred::Function(x_i,x_iplusk)::Bool: The predicate to be applied to each pair
x: The series whose lags are inspected.
lags: An iterator of Integer lag sizes
k: A single lag.

Common case is to compute the number of missings for the autocorrelation: with predicate missinginpair(x,y) = ismissing(x) || ismissing(y).

source