Dot plot : (also known as bivariate display, "scattergram"
or bitmap). Objective: To understand the correlated nature of standard
list mode data and why this is important. Text header example:
This type of display plots one dot or point on the display related to the amount
of parameter x and y for each cell which passed through the instrument.
Dotplots are good for detecting small numbers of events which are well separated
from the main populations of the cells present but give little or no
indication of the relative
density of numbers of events in populations. This is particularly true for large
data files. This is one reason for using a density plot.
Density plot :
Density plots simulate a three
dimensional display of events with the " third " parameter
being the number of events. Thus the x and y values are the same as for a dot
plot and usually colours, but sometimes shades of grey, indicate the relative
numbers of events. This type of display can clarify to the user discreet populations.
However the pitfall with density plots, which also holds true for contour plots
is that the options selected can dramatically emphasise or hide altogether,
populations of cells. I'll deal with this topic more fully below, but let me
just mention the least widely used type of display, the contour plot.
Contour plots :
Contour plots are another two
dimensional display of relative X and Y amounts of two parameters, with contour
lines being drawn to form x and y co-ordinates which have similar numbers of
cells. So think about this as X versus Y, with Z, coming out towards you as
the counts. Some people find a mountain analogy helpful, and each contour
line representing a change in elevation (in number of events).
Hopefully this section has provided a brief refresher for those of you
who are familiar with the flow data analysis, and simple overview for those
of you who are new to this area.
Data Analysis Approaches
This section involves the use of two of the software packages that we will utilise
in this workshop, namely CELLQuest, and WinMDI. CELLQuest operates on the Macintosh
platform, with WinMDI available for use on IBM compatible PC's, and on Macintosh
computers running Virtual PC. A third software package, Flowjo, will also be used
however as it operates in a slightly different manner it will be treated separately.
In these exercises you shall deal with real data and consider some of the factors
which should influence the data display method you choose to use.
Data File Format:
The data generated by most commercial flow cytometers is stored in
F low C ytometry S tandard (FCS) format. Within this
standard the most common mode of storage is List Mode. List mode data files
have a text header followed by the data values, stored in a sequential fashion,
as they were generated by each cell as it passed through the instrument.
FCS2.0 256 1365 1536 101535 0 0 8DHAP001\$DATE\8-Apr-98\$DATATYPE\I\$MODE\L\$NEXTDATA\0\$BYTEORD\4,3,2,1\$SYS\BD - LYSYS II Version 1.0 11/90 - HP Pascal 3.22\SAMPLE ID\ 2-Height\$P5B\8\$P5R\256\$P5N\FL3-H\$G5N\FL3-H\$P5S\FL3-Height\$CYT\FACSCAN\TEST PULSES\FALSE\AUTOSAMPLER\FALSE\PP\FALSE\LASER POWER\15.05\LASER MP\137,100,100,100,100\L2$MODE\1,1,0,0,0\L2$COMP\0,0,0,0\CREATOR\LYSYS\$DFC2TO1\ 0.0\$DFC1TO2\ 0.0\$DFC3TO2\ 0.0\$DFC2TO3\ O\15.05\$P3O\15.05\$P4O\15.05\$P5O\15.05\P1M\0\P2M\0\P3M\4\P4M\4\P5M\4\CONVERTED BY\FACSConvert v1.0\
The first part of the file text header simply states which FCS format is being used, where the text information starts (256) and stops (1365) then where the binary data starts (1536) and (101535). Following this part of the text is a variety of fields each one separated by a \ sign and starting with the $ character. Here are some important ones from the header above:
Histogram analysis:
Objective: To become familiar with histogram display options
of flow cytometry data.
As previously discussed, when considering the various methods of displaying
flow cytometry data, single parameter frequency histograms are one of the simplest
methods yet they are often misused. Below is an overlay plot showing data from
a negative control file and a
test sample .
Please look closely at this histogram and let's discuss what it shows.
Firstly, the control file has a reasonably homogeneous
level of background fluorescence, that the data was collected using a four decade
log amplifier, and that the sample size was large enough that it is representative
of the population. The basic premise of setting the level of background fluorescence
of an unstained sample, such that it falls in the first decade of the log scale,
is that any staining, be it specific, or non specific shall move further up
the log scale.
The test sample on the other hand shows a
bimodal distribution, with a percentage of the cells staining with the antibody/fluorochrome
of interest, while other events show little fluorescence.
Nothing can be said about the other parameters in either file based on the fluorescence histogram, and thus the popularity of bivariate displays.
This leads us to ask then in which situations would you
choose to use a histogram for data analysis?
The best answer to this question may be that a histogram may be the most suitable
method for data analysis when working with populations of cells which are homogeneous
in nature, or when the sample has had other events which are not of interest
removed prior to display as a histogram. Examples of this would when an antibody
titration is being carried out to determine the correct amount of antibody
to use.
Quite clearly histograms can be effective for these types of direct visual comparisons,
but if one is interested in quantifying the level of change in fluorescence,
then either a traditional subjective approach, using markers or regions, or
a non subjective approach, such as histogram subtraction, must be used.
The setting of markers or bounds on histogram displays based on visual examination
of the data, is subjective, yet can yield meaningful results provided the data
set contains only cells of interest, and the data with which the initial histogram
is to be compared is well separated from the test data.
Let's overlay some data files, observe the data file below, then place your
mouse somewhere over the image , shortly the image will change. Click on
the image to view some formatting options:
Look closely, the marker has been set based on the leading edge of control file
(negative), and then the percentage of events in the marker region are displayed,
firstly for the control file and then the overlay. The median and mean for each
curve can be seen, the top line shows the statistics for the whole display,
and the bottom line shows the statistics for the marker region.
As soon as you start looking at the percentage of cells in the marker region,
it would be reasonable to say that the marker has been set in the wrong position,
in this case too far to the left on the histogram display, as there are 4.1
percent of the negative control file positioned in the marker region. One take
home message from this is to maximize the size of the display window when drawing
regions and markers to hopefully make it easier for you to set the correct position
and increase the meanfullness of the statistics you are generating. Ideally,
the marker would be positioned such that less than one percent of the control
files background falls in the marker region.
To assess the relative level of fluorescence, consider the Now that we have
generated statistics to quantify the level of positive expression, this really
is a good time to deal with the statistics most commonly used in flow cytometry.
We'll come back to these histograms shortly, consider the statistics we've generated,
and then work through another approach to histogram analysis. Some more information regarding statistics can be found here While the statistics generated are valid for the data, there is a better,
less subjective manner in which to approach this. To do so, we have to use a
method known as histogram subtraction which is designed to calculate
the "true" percentage of positively labelled cells, and also the relative
shift in a test sample compared to the control. Next, compare the result of a histogram subtraction using CELLQuest, and Flowjo
, working with the same two data files which were overlayed earlier.
If all went to plan, then the histogram overlay you were working with should
now show an additional line which represents the difference between the two
histograms. Note that by altering the order of the data files in the histogram
tools prior to selection for histogram subtraction, the opposite subtraction
to the one you have just completed would be done. If you do histogram analysis regularly, and you would normally subjectively
set markers, look to see if this subtraction feature is available in your software.
If it is available, try utilising it in typical analyses and see how it compares
with your current approach. It is worth noting that histogram subtraction is
not widely used in the published literature, for a number of reasons, some of
which are historical, and that this method may be critised by some reviewers.
Keep this in mind when using this method and create your confidence in this
method prior to its use.
Further reading: Click here to continue with Contour and density plots:
or be a "glutton for punishment" and
Click here to continue with Regions and Gates.
STATISTICS :
Two factors which you should consider, prior to using statistics on your population
of interest are:
1. Sample size: the size of
the target population under consideration should be large enough such that the
precision of the statistics generated provide a reasonable estimate of the number
of labelled cells of interest. The precision of the data is dependant on the
number of cells counted, and as number of events increases the coefficient of
variation of the estimate decreases.
2. The affect the incorrect
choice of statistic can have on the relevance of the numbers that are generated.
The most commonly used statistic in flow cytometry is the mean . There
are two variants, which can be defined thus:
arithmetic mean = Sum (x1 to xn)/n,
geometric mean = (x1 x
x2 x
x3 ... x
xn) nth root.
The arithmetic mean is well suited to the analysis of data that is collected
on a linear scale, while as a general rule, the geometric mean is better suited
for use with data collected with a logarithmic amplifier.
The arithmetic mean is easily skewed by small numbers of outlying events, while
the geometric mean will often prove to be a good indicator of the central tendency
of a population. However many researchers feel that the arithmetic mean of a
population is a "truer" representation of the data and want to know
if there are small numbers of outlying events.
The median is the other most widely used robust statistic, and provides
a good indication of the central tendency of the population.
Median = the 50
th percentile.
The midpoint in a series of events arranged sequentially based on increasing intensity.
The median is the preferred statistic for many analyses as it will still yield
meaningful results in a variety of situations where events accumulate in the end
channels of the fluorescence histogram display.
http://jcsmr.anu.edu.au/facslab/statistics.html
Look at the statistics that you generated earlier. Look at the difference between
the arithmetic, geometric means and the median. Note that for different data
there are variations in the values of the three statistics which have been generated,
and now think about which statistic is the most appropriate to use, given what
you have just read above..
Software packages such as Winlist ( Verity Software
House ) can be used to subtract one histogram, the control, from the sample
histogram.
You should see a small newly created histogram
in the window. This new histogram represents the differences between the
control and the test sample for the parameter
being assessed. As you are not choosing to set any bounds in a subjective manner
it is a simple, objective measure of the differences between the two populations.
The numbers in the small statistics box show the differences between the two
populations. This type of data, where the test sample shows a comparitively
small shift in the relative fluorescence, is well suited to histogram subtraction,
as opposed to a "set marker" type approach, as one has difficulty with this
type of experiment in clearly deciding where a marker could, or should be set.
Consider the percentage positive cells that each method reports, 52.25% for
CELLQuest, and 52.2% for Flowjo, now compare this results with that obtained
manual in the set marker approach. The manual method above shows 56.18% positive,
and if we subtract the 4.10% which fell in the marker region initially then
the resultant manual percentage of positive cells is 52.18%, which is very close
to the subtracted result.
"Introduction to Flow Cytometry" James V. Watson. Cambridge University Press
ISBN 0 521 38061 8
and an oldie but a goody
"Flow Cytometry: Instrumentation and Data Analysis", edited by Marvin A. Van
Dilla, Phillip N. Dean, Ole D. Laerum, and Myron R. Melamed, Academic Press,
1985.. ISBN 012 712150 1
Other interesting reading on histogram analysis,
Lampariello F. (et al) Cytometry 1994; 15:294-301, & Cytometry 1998; 32:241-254,
Cytometry 2000;38:179-188,
Watson J.V. Cytometry 2001; 43:55-68