Standard error of the mean of correlated series

The standard error of the mean of a series with positive autocorrelation is larger than for an uncorrelated series, because the variance includes positive covariance terms in addition to the variance of each record.

Effective number of observations

For uncorrelated series, the variance of the mean decreases by the number of records compared to the variance of a single record.

\[Var(\bar{x}) = {Var(x) \over n}\]

For correlated series, the effective number of observations is defined as to give the same scaling.

\[Var(\bar{x}) = {Var(x) \over n_{eff}}\]

There are only $n_{eff}$ number of effective obserservations in the series.

LogNormals.effective_n_corFunction
effective_n_cor(x, ms::MissingStrategy=PassMissing()) 
effective_n_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Compute the number of effective observations for an autocorrelated series.

Arguments

  • x: An iterator of a series of observations.
  • ms: MissingStrategy: If not given defaults to PassMissing. Set to ExactMissing() to consciouly handle missing value in x.
  • acf: AutocorrelationFunction starting from lag 0. If not given, defaults to autocor(x, ms)

The formula in Zieba has been extended for missing values:

\[n_{eff} = \frac{n_F}{1+{2 \over n_F} \sum_{k=1}^{min(n-1,n_k)} (n-k-m_k) \rho_k}\]

where $n$ is the number of total records, $n_F$ is the number of finite records, $n_k$ is the nummber of components in the used autocorrelation function ($n-1$ if not estimated from the data) ,$\rho_k$ is the correlation, and $m_k$ is the number of pairs that contain a missing value for lag $k$.

Details

Missing values are not handled by default, i.e. the number of effective observations is missing if ther any missings in x. The recommended way is using ExactMissing(). Alternatively, se to SkipMissing() to speed up computation (by internally omitting count_forlags missing pairs) at the cost of a positively biased result with increasing bias with the number of missings. The latter leads to a subsequent underestimated uncertainty of the sum or the mean.

Examples

using Distributions, DistributionVectors, Missings, MissingStrategies, LinearAlgebra
acf0 = [1,0.4,0.1]
Sigma = cormatrix_for_acf(100, acf0);
# 100 random variables each Normal(1,1)
dmn = MvNormal(ones(100), Symmetric(Sigma));
x = allowmissing(rand(dmn));    
x[11:20] .= missing
neff = effective_n_cor(x, acf0, ExactMissing())
neff < 90
neff_biased = effective_n_cor(x, acf0, SkipMissing())
neff_biased > neff
source
effective_n_cor(x, ms::MissingStrategy=PassMissing()) 
effective_n_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Compute the number of effective observations for an autocorrelated series.

Arguments

  • x: An iterator of a series of observations.
  • ms: MissingStrategy: If not given defaults to PassMissing. Set to ExactMissing() to consciouly handle missing value in x.
  • acf: AutocorrelationFunction starting from lag 0. If not given, defaults to autocor(x, ms)

The formula in Zieba has been extended for missing values:

\[n_{eff} = \frac{n_F}{1+{2 \over n_F} \sum_{k=1}^{min(n-1,n_k)} (n-k-m_k) \rho_k}\]

where $n$ is the number of total records, $n_F$ is the number of finite records, $n_k$ is the nummber of components in the used autocorrelation function ($n-1$ if not estimated from the data) ,$\rho_k$ is the correlation, and $m_k$ is the number of pairs that contain a missing value for lag $k$.

Details

Missing values are not handled by default, i.e. the number of effective observations is missing if ther any missings in x. The recommended way is using ExactMissing(). Alternatively, se to SkipMissing() to speed up computation (by internally omitting count_forlags missing pairs) at the cost of a positively biased result with increasing bias with the number of missings. The latter leads to a subsequent underestimated uncertainty of the sum or the mean.

Examples

using Distributions, DistributionVectors, Missings, MissingStrategies, LinearAlgebra
acf0 = [1,0.4,0.1]
Sigma = cormatrix_for_acf(100, acf0);
# 100 random variables each Normal(1,1)
dmn = MvNormal(ones(100), Symmetric(Sigma));
x = allowmissing(rand(dmn));    
x[11:20] .= missing
neff = effective_n_cor(x, acf0, ExactMissing())
neff < 90
neff_biased = effective_n_cor(x, acf0, SkipMissing())
neff_biased > neff
source
effective_n_cor(x, ms::MissingStrategy=PassMissing()) 
effective_n_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Compute the number of effective observations for an autocorrelated series.

Arguments

  • x: An iterator of a series of observations.
  • ms: MissingStrategy: If not given defaults to PassMissing. Set to ExactMissing() to consciouly handle missing value in x.
  • acf: AutocorrelationFunction starting from lag 0. If not given, defaults to autocor(x, ms)

The formula in Zieba has been extended for missing values:

\[n_{eff} = \frac{n_F}{1+{2 \over n_F} \sum_{k=1}^{min(n-1,n_k)} (n-k-m_k) \rho_k}\]

where $n$ is the number of total records, $n_F$ is the number of finite records, $n_k$ is the nummber of components in the used autocorrelation function ($n-1$ if not estimated from the data) ,$\rho_k$ is the correlation, and $m_k$ is the number of pairs that contain a missing value for lag $k$.

Details

Missing values are not handled by default, i.e. the number of effective observations is missing if ther any missings in x. The recommended way is using ExactMissing(). Alternatively, se to SkipMissing() to speed up computation (by internally omitting count_forlags missing pairs) at the cost of a positively biased result with increasing bias with the number of missings. The latter leads to a subsequent underestimated uncertainty of the sum or the mean.

Examples

using Distributions, DistributionVectors, Missings, MissingStrategies, LinearAlgebra
acf0 = [1,0.4,0.1]
Sigma = cormatrix_for_acf(100, acf0);
# 100 random variables each Normal(1,1)
dmn = MvNormal(ones(100), Symmetric(Sigma));
x = allowmissing(rand(dmn));    
x[11:20] .= missing
neff = effective_n_cor(x, acf0, ExactMissing())
neff < 90
neff_biased = effective_n_cor(x, acf0, SkipMissing())
neff_biased > neff
source

Standard error of the mean of a correlated series

LogNormals.sem_corFunction
sem_cor(x, ms::MissingStrategy=PassMissing())
sem_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Estimate the standard error of the mean of an autocorrelated series: $Var(\bar{x}) = {Var(x) \over n_{eff}}$.

Arguments

  • x: An iterator of a series of observations
  • acf: AutocorrelationFunction starting from lag 0.
  • ms: MissingStrategy passed to effective_n_cor. Value of SkipMissing() speeds up computation compared to ExactMissing(), but leads to a negatively biased result with absolute value of the bias increasing with the number of missings.

Optional Arguments

  • neff: may provide a precomputed number of observations for efficiency.
source
sem_cor(x, ms::MissingStrategy=PassMissing())
sem_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Estimate the standard error of the mean of an autocorrelated series: $Var(\bar{x}) = {Var(x) \over n_{eff}}$.

Arguments

  • x: An iterator of a series of observations
  • acf: AutocorrelationFunction starting from lag 0.
  • ms: MissingStrategy passed to effective_n_cor. Value of SkipMissing() speeds up computation compared to ExactMissing(), but leads to a negatively biased result with absolute value of the bias increasing with the number of missings.

Optional Arguments

  • neff: may provide a precomputed number of observations for efficiency.
source

The default estiamtes the empirical autocorrelation from the given series. If possible, use a more precise estimate from longer series. For example when computing the daily means of an hourly time series, estimate the empirical autocorrelation from monthly or annual series and provide it to the daily applications of sem_cor using argument acfe.

Variance of a correlated series

LogNormals.var_corFunction
var_cor(x, ms::MissingStrategy=PassMissing())
var_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Estimate the variance for an autocorrelated series.

Zieba 2011 provide the following formula:

\[Var(x) = \frac{n_{eff}}{n (n_{eff}-1)} \sum \left( x_i - \bar{x} \right)^2 = {(n-1) n_{eff} \over n (n_{eff}-1)} Var_{uncor}(x)\]

Arguments

  • x: An iterator of a series of observations
  • acf: AutocorrelationFunction starting from lag 0.
  • ms: MissingStrategy passed to effective_n_cor. Value of SkipMissing() speeds up computation compared to ExactMissing(), but leads to a negatively biased result with absolute value of the bias increasing with the number of missings.

Optional Arguments

  • neff: may provide a precomputed number of observations for efficiency.
source
var_cor(x, ms::MissingStrategy=PassMissing())
var_cor(x, acf::AbstractVector, ms::MissingStrategy=PassMissing())

Estimate the variance for an autocorrelated series.

Zieba 2011 provide the following formula:

\[Var(x) = \frac{n_{eff}}{n (n_{eff}-1)} \sum \left( x_i - \bar{x} \right)^2 = {(n-1) n_{eff} \over n (n_{eff}-1)} Var_{uncor}(x)\]

Arguments

  • x: An iterator of a series of observations
  • acf: AutocorrelationFunction starting from lag 0.
  • ms: MissingStrategy passed to effective_n_cor. Value of SkipMissing() speeds up computation compared to ExactMissing(), but leads to a negatively biased result with absolute value of the bias increasing with the number of missings.

Optional Arguments

  • neff: may provide a precomputed number of observations for efficiency.
source

Effective autocorrelation function

LogNormals.autocor_effectiveFunction
autocor_effective(x, ms::MissingStrategy=PassMissing())
autocor_effective(x, acf)

Estimate the effective autocorrelation function for series x.

Arguments

  • x: An iterator of a series of observations
  • ms: MissingStrategy passed to autocor
  • acf: AutocorrelationFunction starting from lag 0

Notes

  • The effect autocorrelation function are the first coefficients of the autocorrelation function up to before the first negative coefficient.
  • According to Zieba 2011 using this effective version rather the full version when estimating the autocorrelationfunction from the data yields better result for the standard error of the mean (sem_cor).
  • Optional argument acf allows the caller to provide a precomputed estimate of autocorrelation function (see autocor).
source
autocor_effective(x, ms::MissingStrategy=PassMissing())
autocor_effective(x, acf)

Estimate the effective autocorrelation function for series x.

Arguments

  • x: An iterator of a series of observations
  • ms: MissingStrategy passed to autocor
  • acf: AutocorrelationFunction starting from lag 0

Notes

  • The effect autocorrelation function are the first coefficients of the autocorrelation function up to before the first negative coefficient.
  • According to Zieba 2011 using this effective version rather the full version when estimating the autocorrelationfunction from the data yields better result for the standard error of the mean (sem_cor).
  • Optional argument acf allows the caller to provide a precomputed estimate of autocorrelation function (see autocor).
source

Autocorrelation of a series with missing values

StatsBase.autocorFunction
autocor(x::AbstractVector{x::<:Union{Missing,Real}, ms::MissingStrategy=PassMissing(); 
    dmean::Bool=true}
autocor(x::AbstractVector{x::<:Union{Missing,Real}, lags, ms::MissingStrategy=PassMissing(); 
    dmean::Bool=true}
autocor(x::AbstractMatrix{x::<:Union{Missing,Real}, ms::MissingStrategy=PassMissing(); 
    dmean::Bool=true}
autocor(x::AbstractMatrix{x::<:Union{Missing,Real}, lags, ms::MissingStrategy=PassMissing(); 
    dmean::Bool=true}

Estimate the autocorrelation function accounting for missing values.

Arguments

  • x: series or matrix with series in columns, which may contain missing values
  • lags: Integer vector of the lags for which correlation should be computed
  • ms: MissingStrategy. Defaults to PassMissing. Set to ExactMissing() to divide the sum in the formula of the exepected value in the formula for the correlation at lag k by n - nmissing instead of n, where nimissing is the number of records where there is a missing either in the original vector or its lagged version (see count_forlags).
  • deman: if false, assume mean(x)==0.

If the missing strategy is set to SkipMissing() then the computation is faster, but it is more strongly biased low with increasing number of missings. Note that StatsBase.autocor uses devision by n instead of 'n-k', the true length of the vectors correlated at lag k resulting in low-biased correlations of higher lags for numerical stability reasons.

source
LogNormals.count_forlagsFunction
count_forlags(pred, x, lags)
count_forlag(pred, x, k::Integer)

Count the number of pairs for lag k which fulfil a predicate.

Arguments

  • pred::Function(x_i,x_iplusk)::Bool: The predicate to be applied to each pair
  • x: The series whose lags are inspected.
  • lags: An iterator of Integer lag sizes
  • k: A single lag.

Common case is to compute the number of missings for the autocorrelation: with predicate missinginpair(x,y) = ismissing(x) || ismissing(y).

source