The Relationship between Body Size and Lifespan
Introduction
While aging is an inevitable process for most species, there is an incredible diversity of lifespans throughout the Tree of Life, ranging from a few days to several millenia. For researchers interested in the fundamental biology behind aging, seeing what aspects of an organism’s biology correlate to lifespan is an important first step on the path to finding concrete explanations
behind their longevity.
For example, in 1975, Dr. Richard Peto published a paper where he established that the different sizes and lifespans of humans and mice didn’t really relate to their respective cancer rates. This was described as Peto’s Paradox, because the expectation was originally that over a lifetime, every cell will accumulate mutations that could eventually cause it to become cancerous; and if an animal had more cells, then this lifetime risk of cancer would only increase further. In fact, it turns out that there
is no relationship between body size, lifespan, and cancer, which is the fact that underlies the focus of my own research!
As we will explore in this section, this paradox is further complicated by another unexpected relationship: animals that are larger tend to also live longer.
Graphing the Data
For this analysis, we will be using the AnAge database of ageing and life history in animals. This database has entries for over 4200 species of animals (also 2 plants and 3 fungi) with data like max lifespan, growth rates and weights at different life stages, descriptions, and metabolism, amongst other things.
First, let’s take a look at the data itself:
# These are the packages we will be using in this analysis
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.3
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'tibble' was built under R version 3.6.3
## Warning: package 'tidyr' was built under R version 3.6.3
## Warning: package 'readr' was built under R version 3.6.3
## Warning: package 'purrr' was built under R version 3.6.3
## Warning: package 'dplyr' was built under R version 3.6.3
## Warning: package 'stringr' was built under R version 3.6.3
## Warning: package 'forcats' was built under R version 3.6.3
options(readr.num_columns = 0)
library(ggpubr)
library(plotly)
# Read the data into a dataframe:
anage <- read_tsv("data/anage_build14.txt",
col_names = T,
col_types = list(
"References" = col_character(), # Needs to be specified or else its interpreted as <int>
"Sample size" = col_factor(c("tiny", "small", "medium", "large", "huge"), ordered = T),
"Data quality" = col_factor(c("low", "questionable", "acceptable", "high"), ordered = T)
)
)
## Warning: 1787 parsing failures.
## row col expected actual file
## 1004 Growth rate (1/days) 1/0/T/F/TRUE/FALSE 0.212 'data/anage_build14.txt'
## 1005 Growth rate (1/days) 1/0/T/F/TRUE/FALSE 0.225 'data/anage_build14.txt'
## 1006 Growth rate (1/days) 1/0/T/F/TRUE/FALSE 0.258 'data/anage_build14.txt'
## 1011 Growth rate (1/days) 1/0/T/F/TRUE/FALSE 0.126 'data/anage_build14.txt'
## 1012 Growth rate (1/days) 1/0/T/F/TRUE/FALSE 0.154 'data/anage_build14.txt'
## .... .................... .................. ...... ........................
## See problems(...) for more details.
# Look at the data using str()
str(anage)
## tibble [4,212 × 31] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ HAGRID : chr [1:4212] "00004" "00005" "00007" "00008" ...
## $ Kingdom : chr [1:4212] "Animalia" "Animalia" "Animalia" "Animalia" ...
## $ Phylum : chr [1:4212] "Arthropoda" "Arthropoda" "Arthropoda" "Arthropoda" ...
## $ Class : chr [1:4212] "Insecta" "Insecta" "Insecta" "Insecta" ...
## $ Order : chr [1:4212] "Diptera" "Hymenoptera" "Hymenoptera" "Lepidoptera" ...
## $ Family : chr [1:4212] "Drosophilidae" "Apidae" "Formicidae" "Nymphalidae" ...
## $ Genus : chr [1:4212] "Drosophila" "Apis" "Lasius" "Bicyclus" ...
## $ Species : chr [1:4212] "melanogaster" "mellifera" "niger" "anynana" ...
## $ Common name : chr [1:4212] "Fruit fly" "Honey bee" "Black garden ant" "Squinting bush brown" ...
## $ Female maturity (days) : num [1:4212] 7 NA NA 15 NA ...
## $ Male maturity (days) : num [1:4212] 7 NA NA 15 NA NA NA 2920 4380 NA ...
## $ Gestation/Incubation (days) : num [1:4212] NA NA NA NA NA 13 NA 6 NA NA ...
## $ Weaning (days) : logi [1:4212] NA NA NA NA NA NA ...
## $ Litter/Clutch size : num [1:4212] NA NA NA NA NA 120000 NA 350000 NA NA ...
## $ Litters/Clutches per year : num [1:4212] NA NA NA NA NA NA NA NA NA NA ...
## $ Inter-litter/Interbirth interval: num [1:4212] NA NA NA NA NA NA NA NA NA NA ...
## $ Birth weight (g) : num [1:4212] NA NA NA NA NA NA NA NA NA NA ...
## $ Weaning weight (g) : logi [1:4212] NA NA NA NA NA NA ...
## $ Adult weight (g) : num [1:4212] NA NA NA NA NA ...
## $ Growth rate (1/days) : logi [1:4212] NA NA NA NA NA NA ...
## $ Maximum longevity (yrs) : num [1:4212] 0.3 8 28 0.5 100 67 100 152 46 60 ...
## $ Source : chr [1:4212] NA "812" "411" "811" ...
## $ Specimen origin : chr [1:4212] "captivity" "unknown" "unknown" "wild" ...
## $ Sample size : Ord.factor w/ 5 levels "tiny"<"small"<..: 4 3 3 3 3 3 3 3 3 3 ...
## $ Data quality : Ord.factor w/ 4 levels "low"<"questionable"<..: 3 3 3 3 3 3 3 3 3 3 ...
## $ IMR (per yr) : num [1:4212] 0.05 NA NA NA NA NA NA 0.013 NA NA ...
## $ MRDT (yrs) : num [1:4212] 0.04 NA NA NA NA NA NA 10 NA NA ...
## $ Metabolic rate (W) : num [1:4212] NA NA NA NA NA NA NA NA NA NA ...
## $ Body mass (g) : num [1:4212] NA NA NA NA NA NA NA NA NA NA ...
## $ Temperature (K) : num [1:4212] NA NA NA NA NA NA NA NA NA NA ...
## $ References : chr [1:4212] "2,20,32,47,53,68,69,240,241,242,243,274,602,981,1150" "63,407,408,741,805,806,808,812,815,828,830,831,847,848,902,908,1143" "411,813,814" "418,809,811" ...
## - attr(*, "problems")= tibble [1,787 × 5] (S3: tbl_df/tbl/data.frame)
## ..$ row : int [1:1787] 1004 1005 1006 1011 1012 1016 1018 1019 1022 1023 ...
## ..$ col : chr [1:1787] "Growth rate (1/days)" "Growth rate (1/days)" "Growth rate (1/days)" "Growth rate (1/days)" ...
## ..$ expected: chr [1:1787] "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" "1/0/T/F/TRUE/FALSE" ...
## ..$ actual : chr [1:1787] "0.212" "0.225" "0.258" "0.126" ...
## ..$ file : chr [1:1787] "'data/anage_build14.txt'" "'data/anage_build14.txt'" "'data/anage_build14.txt'" "'data/anage_build14.txt'" ...
## - attr(*, "spec")=
## .. cols(
## .. HAGRID = col_character(),
## .. Kingdom = col_character(),
## .. Phylum = col_character(),
## .. Class = col_character(),
## .. Order = col_character(),
## .. Family = col_character(),
## .. Genus = col_character(),
## .. Species = col_character(),
## .. `Common name` = col_character(),
## .. `Female maturity (days)` = col_double(),
## .. `Male maturity (days)` = col_double(),
## .. `Gestation/Incubation (days)` = col_double(),
## .. `Weaning (days)` = col_logical(),
## .. `Litter/Clutch size` = col_double(),
## .. `Litters/Clutches per year` = col_double(),
## .. `Inter-litter/Interbirth interval` = col_double(),
## .. `Birth weight (g)` = col_double(),
## .. `Weaning weight (g)` = col_logical(),
## .. `Adult weight (g)` = col_double(),
## .. `Growth rate (1/days)` = col_logical(),
## .. `Maximum longevity (yrs)` = col_double(),
## .. Source = col_character(),
## .. `Specimen origin` = col_character(),
## .. `Sample size` = col_factor(levels = c("tiny", "small", "medium", "large", "huge"), ordered = TRUE, include_na = FALSE),
## .. `Data quality` = col_factor(levels = c("low", "questionable", "acceptable", "high"), ordered = TRUE, include_na = FALSE),
## .. `IMR (per yr)` = col_double(),
## .. `MRDT (yrs)` = col_double(),
## .. `Metabolic rate (W)` = col_double(),
## .. `Body mass (g)` = col_double(),
## .. `Temperature (K)` = col_double(),
## .. References = col_character()
## .. )
Using str() is a useful way of quickly seeing the different columns of data and the type of data in each. Of interest are the taxonomic columns, the Adult Weight column, and the Maximum Lifespan column. You can also try use head() to look at the first few rows:
head(anage)
## # A tibble: 6 x 31
## HAGRID Kingdom Phylum Class Order Family Genus Species `Common name`
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 00004 Animal… Arthr… Inse… Dipt… Droso… Dros… melano… Fruit fly
## 2 00005 Animal… Arthr… Inse… Hyme… Apidae Apis mellif… Honey bee
## 3 00007 Animal… Arthr… Inse… Hyme… Formi… Lasi… niger Black garden…
## 4 00008 Animal… Arthr… Inse… Lepi… Nymph… Bicy… anynana Squinting bu…
## 5 00009 Animal… Arthr… Mala… Deca… Nephr… Homa… americ… American lob…
## 6 00011 Animal… Chord… Acti… Acip… Acipe… Acip… brevir… Shortnose st…
## # … with 22 more variables: `Female maturity (days)` <dbl>, `Male maturity
## # (days)` <dbl>, `Gestation/Incubation (days)` <dbl>, `Weaning (days)` <lgl>,
## # `Litter/Clutch size` <dbl>, `Litters/Clutches per year` <dbl>,
## # `Inter-litter/Interbirth interval` <dbl>, `Birth weight (g)` <dbl>,
## # `Weaning weight (g)` <lgl>, `Adult weight (g)` <dbl>, `Growth rate
## # (1/days)` <lgl>, `Maximum longevity (yrs)` <dbl>, Source <chr>, `Specimen
## # origin` <chr>, `Sample size` <ord>, `Data quality` <ord>, `IMR (per
## # yr)` <dbl>, `MRDT (yrs)` <dbl>, `Metabolic rate (W)` <dbl>, `Body mass
## # (g)` <dbl>, `Temperature (K)` <dbl>, References <chr>
Using tibble
from tidyverse
also automatically shows only the first few rows of the dataset if you call anage
by itself.
All species
Let’s try graphing now. First, we will graph all the species; we will graph the adult weights versus their maximum lifespan, and color the datapoints by their Phylum:
# Create the basic plot
p.all <- anage %>%
ggplot(
aes(`Adult weight (g)`, `Maximum longevity (yrs)`, color=Phylum, text=str_glue("Common Name: {`Common name`}<br>Data Quality: {`Data quality`}<br>Sample size: {`Sample size`}"))
) +
geom_point(size=0.5) +
scale_x_log10() +
scale_y_log10() +
labs(
title='AnAge Species Adult Weight vs Lifespan',
y='Adult Weight - log(g)',
x='Maximum Longevity - log(yrs) '
) +
theme_pubclean() +
labs_pubr()
# Output the interactive plot
ggplotly(p.all)
(Note that we scaled the axes using a log-scale; this is done because we want to highlight orders-of-magnitude changes over small-scale change - in other words, we don’t care so much about the difference between 1-2 grams and 100-200 grams as much as a change between 1-10 grams and 100-1000 grams.)
You can see that the graph already has a clear upwards trend! However, there’s a bit of an issue that’s striking in the color scheme; where are our non-chordate species? The first guess I have is that it relates to size measurements - that’s relatively easy to check:
anage %>%
filter(!Phylum=="Chordata") %>%
select(Kingdom, Phylum, Genus, Species, `Common name`, `Maximum longevity (yrs)`, contains("weight"))
## # A tibble: 16 x 9
## Kingdom Phylum Genus Species `Common name` `Maximum longev… `Birth weight (…
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Animal… Arthr… Dros… melano… Fruit fly 0.3 NA
## 2 Animal… Arthr… Apis mellif… Honey bee 8 NA
## 3 Animal… Arthr… Lasi… niger Black garden… 28 NA
## 4 Animal… Arthr… Bicy… anynana Squinting bu… 0.5 NA
## 5 Animal… Arthr… Homa… americ… American lob… 100 NA
## 6 Animal… Cnida… Turr… nutric… Immortal jel… NA NA
## 7 Animal… Echin… Stro… franci… Red sea urch… 200 NA
## 8 Animal… Echin… Stro… purpur… Purple sea u… 50 NA
## 9 Animal… Mollu… Arct… island… Ocean quahog… 507 NA
## 10 Animal… Nemat… Caen… elegans Roundworm 0.16 NA
## 11 Animal… Porif… Cina… antarc… Epibenthic s… 1550 NA
## 12 Animal… Porif… Scol… joubini Hexactinelli… 15000 NA
## 13 Plantae Pinop… Pinus longae… Great Basin … 4713 NA
## 14 Fungi Ascom… Sacc… cerevi… Baker's yeast 0.04 NA
## 15 Fungi Ascom… Schi… pombe Fission yeast NA NA
## 16 Fungi Ascom… Podo… anseri… Filamentous … NA NA
## # … with 2 more variables: `Weaning weight (g)` <lgl>, `Adult weight (g)` <dbl>
None of the non-chordates in the AnAge database have any weight information - go figure! However, from the lifespan, we can see that some of these live a ridiculously long time:
# Filter anage based on the weird characteristics:
anage %>%
arrange(desc(`Maximum longevity (yrs)`)) %>%
head %>%
select(Kingdom, Phylum, Genus, Species, `Common name`, `Maximum longevity (yrs)`, `Adult weight (g)`, `Data quality`)
## # A tibble: 6 x 8
## Kingdom Phylum Genus Species `Common name` `Maximum longev… `Adult weight (…
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Animal… Porif… Scol… joubini Hexactinelli… 15000 NA
## 2 Plantae Pinop… Pinus longae… Great Basin … 4713 NA
## 3 Animal… Porif… Cina… antarc… Epibenthic s… 1550 NA
## 4 Animal… Mollu… Arct… island… Ocean quahog… 507 NA
## 5 Animal… Chord… Bala… mystic… Bowhead whale 211 100000000.
## 6 Animal… Chord… Seba… aleuti… Rougheye roc… 205 495
## # … with 1 more variable: `Data quality` <ord>
Remember how I said that some animals live for millenia? Behold the humble sponge; specifically, Scolymastra joubini, which apparently lives for 15,000 years! Its worth noting the column “Data.quality” here; there’s some skepticism in the literature as to whether or not this is estimate is real, since its so incredible. Runners-up include the Great Basin bristlecone pine, the Ocean quahog clam, the Greanland Shark, and my favorite, the Bowhead Whale!
Chordates
Moving on, let us graph the chordates, and color by class. Also, while we’re at it, let’s quantify the relationship between size and lifespan using a linear regression:
anage.chordata <- anage %>%
filter(
Phylum=="Chordata",
!is.na(`Adult weight (g)`),
!is.na(`Maximum longevity (yrs)`)
)
# Basic Graph
p.chordata <- anage.chordata %>%
ggplot(
aes(`Adult weight (g)`, `Maximum longevity (yrs)`, color=Class, text=str_glue("Common Name: {`Common name`}<br>Data Quality: {`Data quality`}<br>Sample size: {`Sample size`}"))
) +
geom_point(size=0.5) +
scale_x_log10() +
scale_y_log10() +
geom_smooth(
method='lm',
aes(`Adult weight (g)`, `Maximum longevity (yrs)`),
inherit.aes = FALSE,
col="black",
lty="dashed"
)+
labs(
title='Chordates Adult Weight vs Lifespan',
y='Adult Weight - log(g)',
x='Maximum Longevity - log(yrs) '
) +
theme_pubclean() +
labs_pubr()
# Output the interactive plot
ggplotly(p.chordata)
## `geom_smooth()` using formula 'y ~ x'