There appears to be a bug with `ace(...,model="ARD")`

.
Reversible models work fine with `ace`

, ape's function for
ancestral state estimation; however with asymmetric models - in which
*q*_{i,j} and *q*_{j,i} are not necessarily
equal for all *i* & *j* - something seems to be awry.

I will report this to Emmanuel, of course - but I thought that I would post the demo so that he (& others) can see what I mean.

I'm going to use an empirical dataset that was shared with me to help
identify the problem; however I have removed all identifying information
from the phylogeny & trait data. Note that the issue is not always
nearly as readily identifiable as it is in this case. Note also that it
does not have to do with model fitting, as the same non-reversible model
is fit (more or less - to a reasonable standard of numerical precision)
by `ace`

, `make.simmap`

, and diversitree. Rather
the issue seems to be in the calculation of the marginal ancestral states
given a particular fitted model.

```
library(phytools)
library(diversitree)
## SYM model
trees.SYM<-make.simmap(phy,x,model="SYM",nsim=500) ## make simmap
```

```
## make.simmap is sampling character histories conditioned on the transition matrix
##
## Q =
## 0 1
## 0 -0.005818548 0.005818548
## 1 0.005818548 -0.005818548
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## 0 1
## 0.5 0.5
```

```
## Done.
```

```
smp.SYM<-summary(trees.SYM) ## PP from stochastic mapping
fit.SYM<-ace(x,phy,type="discrete",model="SYM") ## marginal ASR
## ARD model
trees.ARD<-make.simmap(phy,x,model="ARD",nsim=500) ## make.simmap
```

```
## make.simmap is sampling character histories conditioned on the transition matrix
##
## Q =
## 0 1
## 0 -0.026275656 0.026275656
## 1 0.008346216 -0.008346216
## (estimated using likelihood);
## and (mean) root node prior probabilities
## pi =
## 0 1
## 0.5 0.5
```

```
## Done.
```

```
smp.ARD<-summary(trees.ARD) ## PP from stochastic mapping
fit.ARD<-ace(x,phy,type="discrete",model="ARD") ## marginal ASR
## compare each one
par(mfrow=c(1,2))
plot(fit.SYM$lik.anc,smp.SYM$ace,main="model=\"SYM\"",
xlab="Marginal ancestral states | ace",
ylab="PP from stochastic mapping | make.simmap")
lines(c(0,1),c(0,1),lty="dashed",col="red")
plot(fit.ARD$lik.anc,smp.ARD$ace,main="model=\"ARD\"",
xlab="Marginal ancestral states | ace",
ylab="PP from stochastic mapping | make.simmap")
lines(c(0,1),c(0,1),lty="dashed",col="red")
```

These should both be the same - on average, that is. Stochastic mapping is, as the name implies, a stochastic procedure, so we might expect small differences. On average, and over a large number of replicates (as in this case; however) they should match.

It's hard to say what's wrong, here, so I thought I'd compare to diversitree
which can also fit the M*k* model & compute ancestral states:

```
## now compare both to diversitree
lik<-make.mk2(phy,x) ## make likelihood function
## first "SYM"
lik.sym<-constrain(lik,q01~q10)
fitted.sym<-find.mle(lik.sym,x.init=0.1)
asr.sym<-asr.marginal(lik.sym,pars=fitted.sym$par)
## next "ARD"
fitted.ard<-find.mle(lik,x.init=c(0.1,0.1)) ## also tried root=ROOT.FLAT
asr.ard<-asr.marginal(lik,pars=fitted.ard$par)
## compare back to ace & make.simmap
par(mfrow=c(2,2))
plot(t(asr.sym),fit.SYM$lik.anc,
main="diversitree vs. ace\nmodel=\"SYM\"",
xlab="Marginal states | diversitree",
ylab="Marginal states | ace")
lines(c(0,1),c(0,1),lty="dashed",col="red")
plot(t(asr.sym),smp.SYM$ace,
main="diversitree vs. make.simmap\nmodel=\"SYM\"",
xlab="Marginal states | diversitree",
ylab="PP from stochastic mapping | make.simmap")
lines(c(0,1),c(0,1),lty="dashed",col="red")
plot(t(asr.ard),fit.ARD$lik.anc,
main="diversitree vs. ace\nmodel=\"ARD\"",
xlab="Marginal states | diversitree",
ylab="Marginal states | ace")
lines(c(0,1),c(0,1),lty="dashed",col="red")
plot(t(asr.ard),smp.ARD$ace,
main="diversitree vs. make.simmap\nmodel=\"ARD\"",
xlab="Marginal states | diversitree",
ylab="PP from stochastic mapping | make.simmap")
lines(c(0,1),c(0,1),lty="dashed",col="red")
```

So it seems that `make.simmap`

and diversitree agree for
`model="ARD"`

(remember, `make.simmap`

is a stochastic
method, so we don't expect them to match *exactly*, except as our
number of simulations going to infinity), but that `ace`

does not.
That's all I can say at the moment about this, but it is probably worth
looking into!

For full disclosure, this issue (identified as a possible error with
`make.simmap`

) was discovered by someone else & pointed out to me.
(Also, initially I didn't believe it!)

## No comments:

## Post a Comment

Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.