Monday, March 11, 2019

Projecting a phylogeny onto a geographic map showing species ranges in R

In the past couple of days I have posted a few times (1, 2, 3) about projecting a phylogeny to a geographic map in which we had multiple observations per species.

Today, I thought it might be neat to demonstrate what we can do if we have lots of observations per species and what to project a tree onto estimated range areas for each species. Note that to duplicate this post exactly it is necessary to update phytools from GitHub.

First, let's start with a fictional dataset from Brazil:

## Phylogenetic tree with 8 tips and 7 internal nodes.
## Tip labels:
##  I.ike, I.zawsgyqmd, C.mkgiraxh, I.nqxrdes, T.zfrbt, F.uhlvwd, ...
## Rooted; includes branch lengths.
##                  lat      long
## C.mkgiraxh -5.923737 -50.30310
## C.mkgiraxh -6.961451 -51.40991
## C.mkgiraxh -6.105770 -51.44966
## C.mkgiraxh -6.133681 -49.31009
## C.mkgiraxh -7.022131 -53.61715
## C.mkgiraxh -4.509163 -49.00638
##                lat      long
## T.zfrbt  -9.086615 -49.16497
## T.zfrbt  -8.874021 -52.17523
## T.zfrbt  -7.510258 -52.21743
## T.zfrbt  -8.656923 -51.57321
## T.zfrbt -10.740220 -51.95592
## T.zfrbt  -8.993945 -50.38208

Now, let's plot these data just as we did yesterday. Again, I'll use the viridis package (which I just discovered) for colors:

##       I.ike I.zawsgyqmd  C.mkgiraxh   I.nqxrdes     T.zfrbt    F.uhlvwd 
## "#277F8EFF" "#46337EFF" "#9FDA3AFF" "#1FA187FF" "#440154FF" "#FDE725FF" 
##      T.etng   K.jzwgtku 
## "#365C8DFF" "#4AC16DFF"
## objective: 24
## objective: 12
## objective: 12
## objective: 6
## objective: 4
## objective: 2
## objective: 2

plot of chunk unnamed-chunk-3

That's actually pretty cool looking, but maybe instead of plotting all the points we'd like to plot shapes. One way we could do this is by estimating a convex hull (MCP) for each taxon, and then overlay those at the end:

for(i in 1:Ntip(Brazil.tree)){
    mcp<-if(i==1) spr[chull(spr),] else rbind(mcp,spr[chull(spr),])
for(i in 1:Ntip(Brazil.tree)){

plot of chunk unnamed-chunk-4

Or, alternatively, we could plot the shapes without the points:

## objective: 24
## objective: 12
## objective: 12
## objective: 6
## objective: 4
## objective: 2
## objective: 2
for(i in 1:Ntip(Brazil.tree)){

plot of chunk unnamed-chunk-5

That's not bad.


  1. First, thank you for creating and demonstrating this wonderful package. Great visualization of data!

    I am trying to implement this pipeline myself. I have a phylogeny and coordinates for each species in the phylogeny. Unfortunately, I am getting an error with
    objective: 0
    objective: 0
    objective: 0
    objective: 0
    objective: 0
    objective: 0
    objective: 0
    objective: 0
    objective: 0
    Error in coords[cw$tip.label, 2:1] : subscript out of bounds

    I am sure the names in the phylogeny match the names in the coordinate data set and they are all characters. Any advice would be helpful.

    Thank you for your time,

    1. I figured out my problem! I need a matrix for the coordinates, not a data frame.

    2. @kwgray Hello, How did you resolve it?

    3. Dr. Revell, thank you for your answer, I appreciate very much your work, It's amazing! Thank you
      as.matrix Didn't work :(
      > lat.long=read.table(file.choose(), sep=',', head=T, rownames=NULL)
      > head(lat.long)
      X lat long X.1 X.2 X.3 X.4 X.5 X.6 X.7 X
      1 acutus 19.36117 -101.5737 NA NA NA NA NA NA NA
      2 acutus 19.51342 -101.6574 NA NA NA NA NA NA NA
      3 acutus 19.51342 -101.6574 NA NA NA NA NA NA NA
      4 acutus 19.46861 -101.5619 NA NA NA NA NA NA NA
      5 acutus 19.46861 -101.5619 NA NA NA NA NA NA NA
      6 acutus 19.46861 -101.5619 NA NA NA NA NA NA NA
      >> x=as.vector(lat.long$X)
      > x
      [1] "acutus" "acutus" "acutus" "acutus" "acutus"
      [6] "acutus" "acutus" "acutus" "acutus" "acutus"
      [11] "acutus" "acutus" "acutus" "acutus" "acutus"
      > data=as.matrix(lat.long[2:3])
      > head(data)
      lat long
      [1,] 19.36117 -101.5737
      [2,] 19.51342 -101.6574
      [3,] 19.51342 -101.6574
      [4,] 19.46861 -101.5619
      [5,] 19.46861 -101.5619
      [6,] 19.46861 -101.5619
      > is.matrix(data)
      [1] TRUE
      > coords=cbind(x,data)
      > head(coords)
      x lat long
      [1,] "acutus" "19.3611667" "-101.5736667"
      [2,] "acutus" "19.5134167" "-101.6574444"
      [3,] "acutus" "19.5134167" "-101.6574444"
      [4,] "acutus" "19.4686056" "-101.56195"
      [5,] "acutus" "19.4686056" "-101.56195"
      [6,] "acutus" "19.4686056" "-101.56195"
      > is.matrix(coords)
      [1] TRUE
      Error in coords[cw$tip.label, 2:1] : subscript out of bounds

    4. The problem may be that your matrix should have the species names as row names, not as a separate column. Try assigning the species names as the row names of your matrix & see if that works.

    5. Thank you, Dr Revell, but How I can do that?

    6. In your example:


      All the best, Liam

    7. Hi Liam. I'm having a similar problem. I have multiple coordinates per species, so when I try to assign the rownames() to the matrix it ads a sequential number to the names (e.g. species.1, especies.2, ect.), this ends in a "objective: 0". Any recommendation to deal with the duplicates in the row names of a matrix?

    8. Andrés, did you try converting your data frame of coordinates to a matrix as I advised above (this can be done using the function as.matrix)? This is most likely your problem

  2. Hi Liam, those polygons are simply beautiful!
    I have some questions about object and its posterior plotting:
    I am trying to plot single locations of some primate species on a map, but I'd like to plot just Africa, Asia and some parts of Oceania. I've seen your example pruning Isla de Pascua from a Chile map, but that method doesn't seem to be working. I've checked every region included in the map and its respective code, and unsuccessfully tried to supress the ones I didn't need. Do you know how could I eliminate America and Europe from the world map? (Virtually speaking of course haha). This is some example of what I've been trying:

    plotTree(tree, ftype = "i", lwd = 0.5)

    geo<-read.csv("Locations.csv", h = T, row.names = 1)

    match<, geo)
    #[1] "Dropping tips from the tree because they are not present in the data:"
    #[1] "Homo_sapiens"*

    obj<,geo,regions=".",direction="rightwards", plot = FALSE)

    ii<-which(obj$map$x<c(2,3,4,11,12,13,14,15,16,21,22,23,24,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,182,183,184,185,186,188,189,190,191,192,193,194,195,196,198,240,241,242,243,244,418,419,420,421,422,423,424,425,426,427,428,429,430,431,432,433,434,435,439,440,441,442,443,444,445,446,447,448,449,471,486,496,497,498,499,516,569,593,595,596,597,598,599,600,601,602,603,604,605,606,668,669,670,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,766,767,768,769,770,771,772,773,774,775,776,777,778,779,780,781,782,783,784,785,786,787,788,789,790,791,792,793,794,795,796,797,798,799,800,801,802,803,804,805,806,807,808,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,826,827,828,829,830,831,832,833,834,835,836,837,838,839,846,847,848,849,850,851,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,897,898,899,900,901,902,903,904,905,906,907,908,909,910,932,933,934,935,936,937,938,939,940,941,942,944,945,946,947,948,949,952,953,954,955,960,962,963,964,965,966,967,968,969,992,995,996,997,998,999,1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013,1015,1022,1023,1024,1025,1026,1028,1029,1030,1031,1032,1033,1034,1035,1036,1037,1038,1039,1040,1041,1048,1050,1051,1098,1113,1114,1115,1116,1117,1125,1126,1127,1128,1129,1130,1131,1132 #And so on#))

    plot(obj,direction="rightwards",fsize = 0.4, ftype="i", lwd = 0.3)

  3. Great contribution, as usual, Liam. Have you had any requests to apply colors across clades, rather than across tips? From the figures I've seen so far, each tip receives a unique color. In the polygon figure above, the individual coordinate info is lost due to the collapsing of the clades. Cheers.

    1. This should already be possible. I'll try to remember to post a tutorial showing how when I get a chance.

  4. Hi Liam! thanks for your great contribution! I was trying to use with different map projections (E.g. Lambert) available in the package maps, with no success. Is using a particular projection as a default?

  5. Hi Liam,

    I have been using phytools for a while now and it really is a fantastic tool.

    Is there any method of making the colouration for each point dependant on the clade it is with. I currently have a phylogeny containing multiple samples from seperate species, and rather than colouring each sample indivudually I am looking to group colour tips/samples within certain nodes.

    Is this possible?



  6. Cheers, Liam. How are you? I am trying to project a phylogeny onto a geographic map showing species ranges, but I always obtain the following error:
    Error in res[edge[i, 2]] <- res[edge[i, 1]] + el[i] :
    replacement has length zero

    Here my procedure:
    plebejum.tree <- read.tree(file = 'exp6.tre')
    plot(ladderize(plebejum.tree), edge.width = 3)

    outgroups <- c('B.sanctulum_var1', 'B.sanctulum', 'B.denticolle',
    'B.denticolle_var1', 'B.denticolle_var2', 'B.amazonum', 'B.amazonum_var1',
    'B.minor', 'B.minor_var1', 'B.oxyurum', 'B.oxyurum_var1',
    'B.horvathi', 'B.candidulum', 'B.candidulum_var1')
    plebejum <- drop.tip(plebejum.tree, outgroups)
    Phylogenetic tree with 16 tips and 15 internal nodes.

    Tip labels:
    B.parvum, B.micantulum, B.micantulum_var1, B.estevezae, B.lariversi, B.lariversi_var1, ...

    Rooted; no branch lengths.

    plot(ladderize(plebejum), edge.width = 3)
    todos <- read.csv(file.choose(), h = T, sep = ',', dec = '.')
    species Latitude Longitude
    1 B.estevezae -10.750000 -50.92000
    2 B.estevezae -29.430631 -53.25316
    3 B.estevezae -30.334212 -54.36244
    4 B.lariversi -6.660000 -69.87000
    5 B.lariversi -10.469700 -50.50310
    6 B.lariversi_var1 -17.340000 -44.93000
    7 B.micantulum -1.450000 -48.37000
    8 B.micantulum -7.500000 -65.90000
    9 B.micantulum -6.838610 -35.12610
    10 B.micantulum -7.115000 -34.86310
    11 B.micantulum -11.858100 -55.50560
    12 B.micantulum -18.755000 -40.89080
    13 B.micantulum -19.391100 -40.07220
    14 B.micantulum -22.382300 -41.77540
    15 B.micantulum_var1 -7.700000 -57.60000

    lat <- todos[,2]
    long <- todos[,3]
    names(lat) <- todos$species
    names(long) <- todos$species
    locais <- cbind(lat, long)
    lat long
    B.estevezae -10.750000 -50.92000
    B.estevezae -29.430631 -53.25316
    B.estevezae -30.334212 -54.36244
    B.lariversi -6.660000 -69.87000
    B.lariversi -10.469700 -50.50310
    B.lariversi_var1 -17.340000 -44.93000
    B.micantulum -1.450000 -48.37000
    B.micantulum -7.500000 -65.90000
    B.micantulum -6.838610 -35.12610
    B.micantulum -7.115000 -34.86310
    B.micantulum -11.858100 -55.50560
    B.micantulum -18.755000 -40.89080

    cols<-setNames(sample(viridis(n = Ntip(plebejum))),
    obj <- = plebejum, coords = locais, database = "worldHires",
    regions = c("Brazil", 'Argentina', 'Colombia', 'Venezuela',
    'Peru', 'Paraguay', 'Bolivia', 'Chile', 'Uruguay',
    'Suriname', 'Guyana', 'Ecuador'),
    plot = FALSE, direction = "rightwards")

    plot(obj, direction="rightwards",
    colors = cols, cex.points = c(0, 1), lwd = c(3, 1), ftype = "i")

    The problem is that the tree is not staying together with the map of Brazil. Thank you in advance for considering this matter. My best regards, José Ricardo.

  7. Hi Liam,

    I’ve encountered a problem I’ve not seen others have.

    I can reproduce your example almost perfectly using my own data; tree, map and links all present and correct when plotted.

    However, I only get one tip label displayed (the first alphabetically), and all others blank, but spaced correctly based on where the links start. I get this error when I plot:

    Error in colors[cw$tip.label, ] : subscript out of bounds

    I can’t get to the bottom of it, might you have an idea?

    Thank you in advance, and thanks for phytools, such a great package.

    All the best,

    1. Hi Chris.

      Is it possible that your input lat/long data are in a data frame, instead of a matrix?

      Alternatively, the row labels may not match the tip labels of the tree. This would also explain the error.

      Please let me know what you find.

      All the best, Liam

    2. Hi Liam,

      Thank you. The issue was indeed the coordinate matrix row names not matching the tip label order. I was plotting only one clade of a tree, and so subsetting the matrix.

      Curious that the links mapped and were plotted correctly, just not the tip labels...

      All the best,

  8. hello Liam,
    thanks by the script. do you have ideia how to make a pieplot of genetic data in the map?

    all the best!

    1. Sure. The coordinate system of a plotted map is just the decimal degree system, so if you know the lat/long of your pies you can just add them using floating.pie in the plotrix package.

  9. Hi Liam,

    It is possible to used my own shapefiles instead of the info available in mapdata? For example, ecoregions, or any other shapefile? Or maybe just add more spatial info to the map create with

    And thanks for the updates and the awesome package.

  10. Hi Liam,
    I have a similar question as Hugo, is it possible to add something like rivers to the map? Or to use custom shapefiles?

    Thanks! This is a great package and the blog posts are really helpful.



Note: due to the very large amount of spam, all comments are now automatically submitted for moderation.