The Relationship between Body Size and Lifespan

Introduction

While aging is an inevitable process for most species, there is an incredible diversity of lifespans throughout the Tree of Life, ranging from a few days to several millenia. For researchers interested in the fundamental biology behind aging, seeing what aspects of an organism’s biology correlate to lifespan is an important first step on the path to finding concrete explanations behind their longevity.
For example, in 1975, Dr. Richard Peto published a paper where he established that the different sizes and lifespans of humans and mice didn’t really relate to their respective cancer rates. This was described as Peto’s Paradox, because the expectation was originally that over a lifetime, every cell will accumulate mutations that could eventually cause it to become cancerous; and if an animal had more cells, then this lifetime risk of cancer would only increase further. In fact, it turns out that there is no relationship between body size, lifespan, and cancer, which is the fact that underlies the focus of my own research!
As we will explore in this section, this paradox is further complicated by another unexpected relationship: animals that are larger tend to also live longer.

Graphing the Data

For this analysis, we will be using the AnAge database of ageing and life history in animals. This database has entries for over 4200 species of animals (also 2 plants and 3 fungi) with data like max lifespan, growth rates and weights at different life stages, descriptions, and metabolism, amongst other things.

First, let’s take a look at the data itself:

# These are the packages we will be using in this analysis
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.3
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'tibble' was built under R version 3.6.3
## Warning: package 'tidyr' was built under R version 3.6.3
## Warning: package 'readr' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'stringr' was built under R version 3.6.3
## Warning: package 'forcats' was built under R version 3.6.3
options(readr.num_columns = 0)
library(ggpubr)
library(plotly)
# Read the data into a dataframe:
anage <- read_tsv("data/anage_build14.txt", 
                  col_names = T, 
                  col_types = list(
                    "References" = col_character(),   # Needs to be specified or else its interpreted as <int>
                    "Sample size" = col_factor(c("tiny", "small", "medium", "large", "huge"), ordered = T),
                    "Data quality" = col_factor(c("low", "questionable", "acceptable", "high"), ordered = T)
                    )
                  )
## Warning: 1787 parsing failures.
##  row                  col           expected actual                     file
## 1004 Growth rate (1/days) 1/0/T/F/TRUE/FALSE  0.212 'data/anage_build14.txt'
## 1005 Growth rate (1/days) 1/0/T/F/TRUE/FALSE  0.225 'data/anage_build14.txt'
## 1006 Growth rate (1/days) 1/0/T/F/TRUE/FALSE  0.258 'data/anage_build14.txt'
## 1011 Growth rate (1/days) 1/0/T/F/TRUE/FALSE  0.126 'data/anage_build14.txt'
## 1012 Growth rate (1/days) 1/0/T/F/TRUE/FALSE  0.154 'data/anage_build14.txt'
## .... .................... .................. ...... ........................
## See problems(...) for more details.
# Look at the data using str()
str(anage)
## tibble [4,212 × 31] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ HAGRID                          : chr [1:4212] "00004" "00005" "00007" "00008" ...
##  $ Kingdom                         : chr [1:4212] "Animalia" "Animalia" "Animalia" "Animalia" ...
##  $ Phylum                          : chr [1:4212] "Arthropoda" "Arthropoda" "Arthropoda" "Arthropoda" ...
##  $ Class                           : chr [1:4212] "Insecta" "Insecta" "Insecta" "Insecta" ...
##  $ Order                           : chr [1:4212] "Diptera" "Hymenoptera" "Hymenoptera" "Lepidoptera" ...
##  $ Family                          : chr [1:4212] "Drosophilidae" "Apidae" "Formicidae" "Nymphalidae" ...
##  $ Genus                           : chr [1:4212] "Drosophila" "Apis" "Lasius" "Bicyclus" ...
##  $ Species                         : chr [1:4212] "melanogaster" "mellifera" "niger" "anynana" ...
##  $ Common name                     : chr [1:4212] "Fruit fly" "Honey bee" "Black garden ant" "Squinting bush brown" ...
##  $ Female maturity (days)          : num [1:4212] 7 NA NA 15 NA ...
##  $ Male maturity (days)            : num [1:4212] 7 NA NA 15 NA NA NA 2920 4380 NA ...
##  $ Gestation/Incubation (days)     : num [1:4212] NA NA NA NA NA 13 NA 6 NA NA ...
##  $ Weaning (days)                  : logi [1:4212] NA NA NA NA NA NA ...
##  $ Litter/Clutch size              : num [1:4212] NA NA NA NA NA 120000 NA 350000 NA NA ...
##  $ Litters/Clutches per year       : num [1:4212] NA NA NA NA NA NA NA NA NA NA ...
##  $ Inter-litter/Interbirth interval: num [1:4212] NA NA NA NA NA NA NA NA NA NA ...
##  $ Birth weight (g)                : num [1:4212] NA NA NA NA NA NA NA NA NA NA ...
##  $ Weaning weight (g)              : logi [1:4212] NA NA NA NA NA NA ...
##  $ Adult weight (g)                : num [1:4212] NA NA NA NA NA ...
##  $ Growth rate (1/days)            : logi [1:4212] NA NA NA NA NA NA ...
##  $ Maximum longevity (yrs)         : num [1:4212] 0.3 8 28 0.5 100 67 100 152 46 60 ...
##  $ Source                          : chr [1:4212] NA "812" "411" "811" ...
##  $ Specimen origin                 : chr [1:4212] "captivity" "unknown" "unknown" "wild" ...
##  $ Sample size                     : Ord.factor w/ 5 levels "tiny"<"small"<..: 4 3 3 3 3 3 3 3 3 3 ...
##  $ Data quality                    : Ord.factor w/ 4 levels "low"<"questionable"<..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ IMR (per yr)                    : num [1:4212] 0.05 NA NA NA NA NA NA 0.013 NA NA ...
##  $ MRDT (yrs)                      : num [1:4212] 0.04 NA NA NA NA NA NA 10 NA NA ...
##  $ Metabolic rate (W)              : num [1:4212] NA NA NA NA NA NA NA NA NA NA ...
##  $ Body mass (g)                   : num [1:4212] NA NA NA NA NA NA NA NA NA NA ...
##  $ Temperature (K)                 : num [1:4212] NA NA NA NA NA NA NA NA NA NA ...
##  $ References                      : chr [1:4212] "2,20,32,47,53,68,69,240,241,242,243,274,602,981,1150" "63,407,408,741,805,806,808,812,815,828,830,831,847,848,902,908,1143" "411,813,814" "418,809,811" ...
##  - attr(*, "problems")= tibble [1,787 × 5] (S3: tbl_df/tbl/data.frame)
##   ..$ row     : int [1:1787] 1004 1005 1006 1011 1012 1016 1018 1019 1022 1023 ...
##   ..$ col     : chr [1:1787] "Growth rate (1/days)" "Growth rate (1/days)" "Growth rate (1/days)" "Growth rate (1/days)" ...
##   ..$ expected: chr [1:1787] "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" ...
##   ..$ actual  : chr [1:1787] "0.212" "0.225" "0.258" "0.126" ...
##   ..$ file    : chr [1:1787] "'data/anage_build14.txt'" "'data/anage_build14.txt'" "'data/anage_build14.txt'" "'data/anage_build14.txt'" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   HAGRID = col_character(),
##   ..   Kingdom = col_character(),
##   ..   Phylum = col_character(),
##   ..   Class = col_character(),
##   ..   Order = col_character(),
##   ..   Family = col_character(),
##   ..   Genus = col_character(),
##   ..   Species = col_character(),
##   ..   `Common name` = col_character(),
##   ..   `Female maturity (days)` = col_double(),
##   ..   `Male maturity (days)` = col_double(),
##   ..   `Gestation/Incubation (days)` = col_double(),
##   ..   `Weaning (days)` = col_logical(),
##   ..   `Litter/Clutch size` = col_double(),
##   ..   `Litters/Clutches per year` = col_double(),
##   ..   `Inter-litter/Interbirth interval` = col_double(),
##   ..   `Birth weight (g)` = col_double(),
##   ..   `Weaning weight (g)` = col_logical(),
##   ..   `Adult weight (g)` = col_double(),
##   ..   `Growth rate (1/days)` = col_logical(),
##   ..   `Maximum longevity (yrs)` = col_double(),
##   ..   Source = col_character(),
##   ..   `Specimen origin` = col_character(),
##   ..   `Sample size` = col_factor(levels = c("tiny", "small", "medium", "large", "huge"), ordered = TRUE, include_na = FALSE),
##   ..   `Data quality` = col_factor(levels = c("low", "questionable", "acceptable", "high"), ordered = TRUE, include_na = FALSE),
##   ..   `IMR (per yr)` = col_double(),
##   ..   `MRDT (yrs)` = col_double(),
##   ..   `Metabolic rate (W)` = col_double(),
##   ..   `Body mass (g)` = col_double(),
##   ..   `Temperature (K)` = col_double(),
##   ..   References = col_character()
##   .. )

Using str() is a useful way of quickly seeing the different columns of data and the type of data in each. Of interest are the taxonomic columns, the Adult Weight column, and the Maximum Lifespan column. You can also try use head() to look at the first few rows:

head(anage)
## # A tibble: 6 x 31
##   HAGRID Kingdom Phylum Class Order Family Genus Species `Common name`
##   <chr>  <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>   <chr>        
## 1 00004  Animal… Arthr… Inse… Dipt… Droso… Dros… melano… Fruit fly    
## 2 00005  Animal… Arthr… Inse… Hyme… Apidae Apis  mellif… Honey bee    
## 3 00007  Animal… Arthr… Inse… Hyme… Formi… Lasi… niger   Black garden…
## 4 00008  Animal… Arthr… Inse… Lepi… Nymph… Bicy… anynana Squinting bu…
## 5 00009  Animal… Arthr… Mala… Deca… Nephr… Homa… americ… American lob…
## 6 00011  Animal… Chord… Acti… Acip… Acipe… Acip… brevir… Shortnose st…
## # … with 22 more variables: `Female maturity (days)` <dbl>, `Male maturity
## #   (days)` <dbl>, `Gestation/Incubation (days)` <dbl>, `Weaning (days)` <lgl>,
## #   `Litter/Clutch size` <dbl>, `Litters/Clutches per year` <dbl>,
## #   `Inter-litter/Interbirth interval` <dbl>, `Birth weight (g)` <dbl>,
## #   `Weaning weight (g)` <lgl>, `Adult weight (g)` <dbl>, `Growth rate
## #   (1/days)` <lgl>, `Maximum longevity (yrs)` <dbl>, Source <chr>, `Specimen
## #   origin` <chr>, `Sample size` <ord>, `Data quality` <ord>, `IMR (per
## #   yr)` <dbl>, `MRDT (yrs)` <dbl>, `Metabolic rate (W)` <dbl>, `Body mass
## #   (g)` <dbl>, `Temperature (K)` <dbl>, References <chr>

Using tibble from tidyverse also automatically shows only the first few rows of the dataset if you call anage by itself.

All species

Let’s try graphing now. First, we will graph all the species; we will graph the adult weights versus their maximum lifespan, and color the datapoints by their Phylum:

# Create the basic plot
p.all <- anage %>% 
  ggplot(
    aes(`Adult weight (g)`, `Maximum longevity (yrs)`, color=Phylum, text=str_glue("Common Name: {`Common name`}<br>Data Quality: {`Data quality`}<br>Sample size: {`Sample size`}"))
  ) +
  geom_point(size=0.5) +
  scale_x_log10() +
  scale_y_log10() +
  labs(
    title='AnAge Species Adult Weight vs Lifespan',
    y='Adult Weight - log(g)',
    x='Maximum Longevity - log(yrs) '
  ) + 
  theme_pubclean() + 
  labs_pubr()
# Output the interactive plot
ggplotly(p.all)

(Note that we scaled the axes using a log-scale; this is done because we want to highlight orders-of-magnitude changes over small-scale change - in other words, we don’t care so much about the difference between 1-2 grams and 100-200 grams as much as a change between 1-10 grams and 100-1000 grams.)

You can see that the graph already has a clear upwards trend! However, there’s a bit of an issue that’s striking in the color scheme; where are our non-chordate species? The first guess I have is that it relates to size measurements - that’s relatively easy to check:

anage %>% 
  filter(!Phylum=="Chordata") %>% 
  select(Kingdom, Phylum, Genus, Species, `Common name`, `Maximum longevity (yrs)`, contains("weight"))
## # A tibble: 16 x 9
##    Kingdom Phylum Genus Species `Common name` `Maximum longev… `Birth weight (…
##    <chr>   <chr>  <chr> <chr>   <chr>                    <dbl>            <dbl>
##  1 Animal… Arthr… Dros… melano… Fruit fly                 0.3                NA
##  2 Animal… Arthr… Apis  mellif… Honey bee                 8                  NA
##  3 Animal… Arthr… Lasi… niger   Black garden…            28                  NA
##  4 Animal… Arthr… Bicy… anynana Squinting bu…             0.5                NA
##  5 Animal… Arthr… Homa… americ… American lob…           100                  NA
##  6 Animal… Cnida… Turr… nutric… Immortal jel…            NA                  NA
##  7 Animal… Echin… Stro… franci… Red sea urch…           200                  NA
##  8 Animal… Echin… Stro… purpur… Purple sea u…            50                  NA
##  9 Animal… Mollu… Arct… island… Ocean quahog…           507                  NA
## 10 Animal… Nemat… Caen… elegans Roundworm                 0.16               NA
## 11 Animal… Porif… Cina… antarc… Epibenthic s…          1550                  NA
## 12 Animal… Porif… Scol… joubini Hexactinelli…         15000                  NA
## 13 Plantae Pinop… Pinus longae… Great Basin …          4713                  NA
## 14 Fungi   Ascom… Sacc… cerevi… Baker's yeast             0.04               NA
## 15 Fungi   Ascom… Schi… pombe   Fission yeast            NA                  NA
## 16 Fungi   Ascom… Podo… anseri… Filamentous …            NA                  NA
## # … with 2 more variables: `Weaning weight (g)` <lgl>, `Adult weight (g)` <dbl>

None of the non-chordates in the AnAge database have any weight information - go figure! However, from the lifespan, we can see that some of these live a ridiculously long time:

# Filter anage based on the weird characteristics:
anage %>% 
  arrange(desc(`Maximum longevity (yrs)`)) %>% 
  head %>% 
  select(Kingdom, Phylum, Genus, Species, `Common name`, `Maximum longevity (yrs)`, `Adult weight (g)`, `Data quality`)
## # A tibble: 6 x 8
##   Kingdom Phylum Genus Species `Common name` `Maximum longev… `Adult weight (…
##   <chr>   <chr>  <chr> <chr>   <chr>                    <dbl>            <dbl>
## 1 Animal… Porif… Scol… joubini Hexactinelli…            15000              NA 
## 2 Plantae Pinop… Pinus longae… Great Basin …             4713              NA 
## 3 Animal… Porif… Cina… antarc… Epibenthic s…             1550              NA 
## 4 Animal… Mollu… Arct… island… Ocean quahog…              507              NA 
## 5 Animal… Chord… Bala… mystic… Bowhead whale              211       100000000.
## 6 Animal… Chord… Seba… aleuti… Rougheye roc…              205             495 
## # … with 1 more variable: `Data quality` <ord>

Remember how I said that some animals live for millenia? Behold the humble sponge; specifically, Scolymastra joubini, which apparently lives for 15,000 years! Its worth noting the column “Data.quality” here; there’s some skepticism in the literature as to whether or not this is estimate is real, since its so incredible. Runners-up include the Great Basin bristlecone pine, the Ocean quahog clam, the Greanland Shark, and my favorite, the Bowhead Whale!

Chordates

Moving on, let us graph the chordates, and color by class. Also, while we’re at it, let’s quantify the relationship between size and lifespan using a linear regression:

anage.chordata <- anage %>% 
  filter(
    Phylum=="Chordata",
    !is.na(`Adult weight (g)`),
    !is.na(`Maximum longevity (yrs)`)
      )


# Basic Graph
p.chordata <- anage.chordata %>% 
  ggplot(
    aes(`Adult weight (g)`, `Maximum longevity (yrs)`, color=Class, text=str_glue("Common Name: {`Common name`}<br>Data Quality: {`Data quality`}<br>Sample size: {`Sample size`}"))
  ) +
  geom_point(size=0.5) +
  scale_x_log10() +
  scale_y_log10() +
  geom_smooth(
    method='lm', 
    aes(`Adult weight (g)`, `Maximum longevity (yrs)`), 
    inherit.aes = FALSE,
    col="black",
    lty="dashed"
    )+
  labs(
    title='Chordates Adult Weight vs Lifespan',
    y='Adult Weight - log(g)',
    x='Maximum Longevity - log(yrs) '
  ) +
  theme_pubclean() + 
  labs_pubr()

# Output the interactive plot
ggplotly(p.chordata)
## `geom_smooth()` using formula 'y ~ x'