Diversity
and Similarity Measures
This
site is aimed at ecologists rather than geneticists, but the
mathematical issues are the same. Click
here for the genetics version.
The biological literature about diversity and
similarity indices is a mess. Much of it is superficial and
confusing, and some of it is just plain wrong. Biologists frequently
confuse diversity with other quantities, and frequently use
diversity indices inappropriately. On the other hand many biologists
have sensed that there is something suspicious about the way
diversity indices are used, but their solution is to avoid frequency-based
diversity and similarity measures altogether and use only species
richness and the basic similarity measures derived from it,
like the Jaccard and Sorensen indices. Yet ecologically significant
differences between communities are really differences in species
frequencies, not mere presence or absence. Abusing or avoiding
the analysis of diversity and similarity holds back the whole
field.
What
is diversity? What diversity index should be used for a particular
purpose? How should diversity indices be interpreted? What is
the real definition of alpha and beta diversity--are alpha and
beta multiplicative, as in Whittaker's law (alpha times beta
equals gamma), or are they additive as in Lande's (1996) definition
(alpha plus beta equals gamma), or neither? What do similarity
and overlap measures really measure and which ones should be
used? How are they related to diversity measures, and how are
diversity measures related to each other? Though
biologists have often treated these questions as if they were
matters of opinion, in fact each of these questions has a definitive
answer. These topics have deep logical and mathematical foundations,
rich contexts, and many interconnections between them. These
foundations have not been appreciated by most biologists. Diversity
and similarity indices cannot be treated as random ingredients
in some analytical soup
There
are at least three fundamental mistakes in the literature of
diversity analysis. First, the literature confuses diversity
with the indices used to measure it. An index such as the Shannon-Wiener
index is an entropy, not a diversity, and it must be converted
to a diversity before it can be properly interpreted. This confusion
is the subject of my paper published in 2006 in Oikos,
Entropy and diversity. The second fundamental
mistake is the incorrect definition of beta diversity. It is
easy to prove that the current general definitions of beta (additive,
as in Lande 1996, or multiplicative, as in Whittaker) don't
work for most indices. This is the subject of my paper, Partitioning
diversity into independent alpha and beta components,
in press for the "Concepts and Synthesis" section
of Ecology. The third mistake is a failure
to appreciate the difference between the magnitude
of an effect versus the statistical significance
of an effect. Two diversity measurements might be significantly
different statistically, but this does not tell us much about
the real size of the difference. The magnitude of the difference
might be negligible or huge ... the level of statistical significance
reached has little bearing on this magnitude. To go beyond mere
statistical significance and judge the real magntude of an effect
(which is the more important question scientifically), it is
essential to have good, well-behaved, intuitive, informative,
and easily interpreted measures of diversity and similarity.
The new synthesis I propose gives us interpretable measures;
it is based on two key ideas. One is the concept of effective
number of species (a quantity
introduced to ecology by Robert MacArthur 1965, and developed
by Hill 1973). As mentioned above, I explain
the importance of this in my Oikos article
Entropy
and diversity, with additional explanation here.
If you are comparing the diversities of two or more communities,
you must convert your diversity indices to effective numbers
of species, or you can reach wrong conclusions. Examples
are worth a thousand words, so I have collected some examples
(both imaginary and real) in Measuring
the diversity of a single community
and Comparing
the diversities of two communities. Here you will
see how misleading it can be to compare raw Shannon entropies
or Gini-Simpson indices, and yet how sensible the results can
be when they are done correctly using true diversities (effective
numbers of species).
The
other key idea in the new synthesis is the intuitive and widely
accepted notion that alpha and beta must be independent of each
other. They measure completely different aspects of regional
diversity-- alpha measures the average within-community diversity
while beta measures the between-community component of diversity.
These are orthogonal dimensions which can vary independently:
a high value of the alpha component should not, by itself, force
the beta component to be high (or low), and vice versa. This
mathematical independence between alpha and beta was made into
an explicit condition on beta by Wilson and Shmida (1984), who
noted that without this condition, it would be difficult to
compare beta diversity between sets of communities whose alpha
diversities differed. These authors also noted that levels of
alpha and beta diversity may be established by different ecological
mechanisms, and should therefore be separated to permit independent
analysis.
Surprisingly,
the standard general definitions of beta (additive or multiplicative)
do not produce independent alpha and beta when applied to most
indices. For example, the additive definition produces independent
alpha and beta for Shannon entropy (Sum p_i ln p_i) but not
for the Gini-Simpson index (1-Sum p_i ^2). This lack of independence
for most indices leads to very serious problems which can completely
derail an ecologist's analyses. A region consisting of a thousand
equally-large completely distinct communities, each with a hundred
equally common species (none shared between communities), would
be described by all biologists as a region with extremely high
beta diversity. A region with only two equally large communities,
each with 5 equally common species, and with three of these
species shared by both communities, is clearly a region of lower
beta diversity. Yet by the additive definition, the Gini-Simpson
beta of this second region (Hg - Ha = .04) is four times higher
than that of the first (Hg - Ha = .01)! An ecologist who uses
the additive definition of beta with the Gini-Simpson index
is therefore very likely to reach incorrect conclusions. Anomalous
results also arise when the multiplicative definition is applied
to many indices, proving that ecologists do not yet have a general
mathematical theory of alpha and beta that correctly captures
our intuitive concepts.
This
situation can be remedied by taking the independence condition
as an axiom and deriving (instead of inventing) the
proper relationship between alpha and beta for each index. This,
plus a few other uncontroversial axioms, is enough to generate
a complete new mathematics of alpha and beta for all standard
diversity indices. This is the subject of my Ecology
paper, Partitioning
diversity into independent alpha and beta components.
It turns out that the relation between alpha and beta depends
on the index; there is no universal additive or multiplicative
rule relating the alpha and beta components of an index. However,
when these alpha and beta components of any index are converted
to effective
number of elements
, they all follow a generalized version of Whittaker's multiplicative
law, for all indices! That is, the effective
number of species per community (the alpha diversity) times
the effective number of distinct communities (the beta diversity)
equals the effective number of species of the region (the gamma
diversity). Also, it turns out that alpha and beta diversities
(the effective number of elements) can be calculated directly
from species frequencies and community weights without bothering
to calculate diversity indices. Ecologists' most popular similarity
and overlap indices, like the Jaccard, Sorensen, Horn,
and Morisita-Horn indices, are just monotonic transformations
of this new beta diversity.
If
you only want to know what to do and how to do it, I provide
examples of different kinds of diversity analyses in the links
below. But I would hope that some readers would want to know
why. Part 1: Theoretical background is for
those readers.
I
have produced a critical review of Anne Magurran's popular book,
Measuring Biological Diversity. This book exemplifies the standard
view of diversity measures, and the review serves to explain
why these views are wrong. Click
here to see the review.
I
would like this to be a useful reference site for professional
ecologists and students. Any questions, comments, or opposing
views are welcome. Write me at the address that appears at the
top of my home page.
(Note:
it will take me some time to make all these pages, so this website
will not be complete for a while.)