undiff_af {bnpsd}  R Documentation 
This function takes a vector of allele frequencies and a mean kinship value, and returns a new distribution of allele frequencies that is consistent with reversing differentiation with the given kinship, in the sense that the new distribution is more concentrated around the middle (0.5) than the input/original by an amount predicted from theory. The new distribution is created by weighing the input distribution with a random mixing distribution with a lower variance. An automatic method is provided that always selects a Beta distribution with just the right concentration to work for the given data and kinship. Explicit methods are also provided for more control, but are more likely to result in errors when mixing variances are not small enough (see details below).
undiff_af( p, kinship_mean, distr = c("auto", "uniform", "beta", "point"), alpha = 1, eps = 10 * .Machine$double.eps )
p 
A vector of observed allele frequencies. 
kinship_mean 
The mean kinship value of the differentiation to reverse. 
distr 
Name of the mixing distribution to use.

alpha 
Shape parameter for 
eps 
If 
Model: Suppose we started from an allele frequency p0
with expectation 0.5 and variance V0
.
Differentiation creates a new (sample) allele frequency p1
without changing its mean (E(p1p0) = p0
) and with a conditional variance given by the mean kinship phi
: Var(p1p0) = p0*(1p0)*phi
.
The total variance of the new distribution (calculated using the law of total variance) equals
V1 = Var(p1) = phi/4 + (1phi)*V0
.
(Also E(p1) = 0.5
).
So the new variance is greater for phi>0
(note V0 <= 1/4
for any distribution bounded in (0,1)).
Thus, given V1
and phi
, the goal is to construct a new distribution with the original, lower variance of V0 = (V1phi/4)/(1phi)
.
An error is thrown if V1 < phi/4
in input data, which is inconsistent with this assumed model.
Construction of "undifferentiated" allele frequencies:
p_out = w*p_in + (1w)*p_mix
, where p_in
is the input with sample variance V_in
(V1
in above model) and p_mix
is a random draw from the mixing distribution distr
with expectation 0.5 and known variance V_mix
.
The output variance is V_out = w^2*V_in + (1w)^2*V_mix
, which we set to the desired V_out = (V_inphi/4)/(1phi)
(V0
in above model) and solve for w
(the largest of the two quadratic roots is used).
An error is thrown if V_mix > V_out
(the output variance must be larger than the mixing variance).
This error is avoided by manually adjusting choice of distr
and alpha
(for distr = "beta"
), or preferably with distr = "auto"
(default), which selects a Beta distribution with alpha = (1/(4*V_out)1)/2 + eps
that is guaranteed to work for any valid V_out
(assuming V_in < phi/4
).
A list with the new distribution and several other informative statistics, which are named elements:
p
: A new vector of allele frequencies with the same length as input p
, with the desired variance (see details) obtained by weighing the input p
with new random data from distribution distr
.
w
: The weight used for the input data (1w
for the mixing distribution).
kinship_mean_max
: The maximum mean kinship possible for undifferentiating this data (equals four times the input variance (see details), which results in zero output variance).
V_in
: sample variance of input p
, assuming its expectation is 0.5.
V_out
: target variance of output p
.
V_mix
: variance of mixing distribution.
alpha
: the value of alpha
used for symmetric Beta mixing distribution, informative if distr = "auto"
.
# create random uniform data for these many loci m < 100 p < runif( m ) # differentiate the distribution using BaldingNichols model F < 0.1 nu < 1 / F  1 p2 < rbeta( m, p * nu, (1  p) * nu ) # now undifferentiate with this function # NOTE in this particular case `F` is also the mean kinship # default "automatic" distribution recommended # (avoids possible errors for specific distributions) p3 < undiff_af( p2, F )$p # note p3 does not equal p exactly (original is unrecoverable) # but variances (assuming expectation is 0.5 for all) should be close to each other, # and both be lower than p2's variance: V1 < mean( ( p  0.5 )^2 ) V2 < mean( ( p2  0.5 )^2 ) V3 < mean( ( p3  0.5 )^2 ) # so p3 is stochastically consistent with p as far as the variance is concerned