Flow Cytometry Software Workshop: 2000

© G.W.Osborne

Introduction:
The aim of this flow cytometry data analysis workshop is to foster objective data analysis strategies. The first, and most critical step in this process, is to clearly identify the question/s you are asking. Is the percentage of cells expressing antigen of interest the most important point, or the relative level of expression, or the absolute number of cells with a particular characteristic? Once a clear purpose for the analysis has been established, one can then consider the available options which will provide you with the most meaningful and reproducible results which hopefully reflect the underlying biology from which the data set has been collected.

This web based version of the workshop lacks the majority of the "hands on" computer lessons, (and of course my witty diatribe), however it provides examples of key points as graphics, some of which are active. I'd encourage readers to look closely at the data displays and come to their own conclusions, in light of the associated discussions, as to the relevance of a particular approach, given your data set.



Data Display


Firstly, let's look at an overview of the various types of data displays.
The simplest way of displaying flow cytometry data is the frequency histogram .

histogram graphic

Frequency histograms display relative fluorescence or scattered light signals plotted against the number of events. The simplicity of this type of display is the main attraction. To see the relative levels of other parameters which were collected at the same time , one needs to use one of the forms of bivariate displays namely dot, density or contour plots.
In these type of displays, one parameter is plotted against another in an X versus Y axis display.
Let's now look at each type of plot.

Dot plot : (also known as bivariate display, "scattergram" or bitmap).

dotplot graphics
This type of display plots one dot or point on the display related to the amount of parameter x and y for each cell which passed through the instrument.
Dotplots are good for detecting small numbers of events which are well separated from the main populations of the cells present but give little or no indication of the relative
density of numbers of events in populations. This is particularly true for large data files. This is one reason for using a density plot.

Density plot :

density plot graphic
        Density plots simulate a three dimensional display of events with the " third " parameter being the number of events. Thus the x and y values are the same as for a dot plot and usually colours, but sometimes shades of grey, indicate the relative numbers of events. This type of display can clarify to the user discreet populations. However the pitfall with density plots, which also holds true for contour plots is that the options selected can dramatically emphasise or hide altogether, populations of cells. I'll deal with this topic more fully below, but let me just mention the least widely used type of display, the contour plot.

Contour plots :

contour plot graphic
        Contour plots are another two dimensional display of relative X and Y amounts of two parameters, with contour lines being drawn to form x and y co-ordinates which have similar numbers of cells. So think about this as X versus Y, with Z, coming out towards you as the counts. Some people find a mountain analogy helpful, and each contour line representing a change in elevation (in number of events).

Hopefully this section has provided a brief refresher for those of you who are familiar with the flow data analysis, and simple overview for those of you who are new to this area.
Data Analysis Approaches

This section involves the use of two of the software packages that we will utilise in this workshop, namely CELLQuest, and WinMDI. CELLQuest operates on the Macintosh platform, with WinMDI available for use on IBM compatible PC's, and on Macintosh computers running Virtual PC. A third software package, Flowjo, will also be used however as it operates in a slightly different manner it will be treated separately.

In these exercises you shall deal with real data and consider some of the factors which should influence the data display method you choose to use.

Data File Format:

Objective: To understand the correlated nature of standard list mode data and why this is important.

The data generated by most commercial flow cytometers is stored in F low C ytometry S tandard (FCS) format. Within this standard the most common mode of storage is List Mode. List mode data files have a text header followed by the data values, stored in a sequential fashion, as they were generated by each cell as it passed through the instrument.

Text header example:

FCS2.0 256 1365 1536 101535 0 0 8DHAP001\$DATE\8-Apr-98\$DATATYPE\I\$MODE\L\$NEXTDATA\0\$BYTEORD\4,3,2,1\$SYS\BD - LYSYS II Version 1.0 11/90 - HP Pascal 3.22\SAMPLE ID\ 2-Height\$P5B\8\$P5R\256\$P5N\FL3-H\$G5N\FL3-H\$P5S\FL3-Height\$CYT\FACSCAN\TEST PULSES\FALSE\AUTOSAMPLER\FALSE\PP\FALSE\LASER POWER\15.05\LASER MP\137,100,100,100,100\L2$MODE\1,1,0,0,0\L2$COMP\0,0,0,0\CREATOR\LYSYS\$DFC2TO1\ 0.0\$DFC1TO2\ 0.0\$DFC3TO2\ 0.0\$DFC2TO3\ O\15.05\$P3O\15.05\$P4O\15.05\$P5O\15.05\P1M\0\P2M\0\P3M\4\P4M\4\P5M\4\CONVERTED BY\FACSConvert v1.0\

The first part of the file text header simply states which FCS format is being used, where the text information starts (256) and stops (1365) then where the binary data starts (1536) and (101535). Following this part of the text is a variety of fields each one separated by a \ sign and starting with the $ character. Here are some important ones from the header above:
  1. \$TOT\20000 The total number of events collected
  2. \$FIL\8DHAP001 The file name used for the file
  3. \$MODE\L The data "mode", in this case List mode
  4. $P1B\8\$P1R\256 The number of data bits, 8, and the data range, 256 channel
The reason I've drawn your attention the last two fields is to emphasis that the bit range and channel range are fixed for each parameter and are linear scales. Therefore the data which is displayed on a log scale in any of the various plot format you will see later is actually data from a log amplifier which is stored in a linear format and then the appropriate number of decades of log, in this case 4 applied, when displaying the data, and performing statistics. The data following the header is correlated in that all the recorded values for a particular cell are arranged sequentially and the recorded level of one parameter may be compared to any other for a given particle of interest. This correlation of the data allows the data to be plotted in multi-dimensional space, an attribute which is utilised in many analyses where one wishes to calculate the relative amount of an attribute A compared to attribute B or C.



Histogram analysis:

Objective: To become familiar with histogram display options of flow cytometry data.

As previously discussed, when considering the various methods of displaying flow cytometry data, single parameter frequency histograms are one of the simplest methods yet they are often misused. Below is an overlay plot showing data from a negative control file and a test sample .

histogram overlay 1

Please look closely at this histogram and let's discuss what it shows.

Firstly, the control file has a reasonably homogeneous level of background fluorescence, that the data was collected using a four decade log amplifier, and that the sample size was large enough that it is representative of the population. The basic premise of setting the level of background fluorescence of an unstained sample, such that it falls in the first decade of the log scale, is that any staining, be it specific, or non specific shall move further up the log scale.
The test sample on the other hand shows a bimodal distribution, with a percentage of the cells staining with the antibody/fluorochrome of interest, while other events show little fluorescence.

Nothing can be said about the other parameters in either file based on the fluorescence histogram, and thus the popularity of bivariate displays.

This leads us to ask then in which situations would you choose to use a histogram for data analysis?
The best answer to this question may be that a histogram may be the most suitable method for data analysis when working with populations of cells which are homogeneous in nature, or when the sample has had other events which are not of interest removed prior to display as a histogram. Examples of this would when an antibody titration is being carried out to determine the correct amount of antibody to use.
Quite clearly histograms can be effective for these types of direct visual comparisons, but if one is interested in quantifying the level of change in fluorescence, then either a traditional subjective approach, using markers or regions, or a non subjective approach, such as histogram subtraction, must be used.

The setting of markers or bounds on histogram displays based on visual examination of the data, is subjective, yet can yield meaningful results provided the data set contains only cells of interest, and the data with which the initial histogram is to be compared is well separated from the test data.

Let's overlay some data files, observe the data file below, then place your mouse somewhere over the image , shortly the image will change. Click on the image to view some formatting options:

Look closely, the marker has been set based on the leading edge of control file (negative), and then the percentage of events in the marker region are displayed, firstly for the control file and then the overlay. The median and mean for each curve can be seen, the top line shows the statistics for the whole display, and the bottom line shows the statistics for the marker region.
As soon as you start looking at the percentage of cells in the marker region, it would be reasonable to say that the marker has been set in the wrong position, in this case too far to the left on the histogram display, as there are 4.1 percent of the negative control file positioned in the marker region. One take home message from this is to maximize the size of the display window when drawing regions and markers to hopefully make it easier for you to set the correct position and increase the meanfullness of the statistics you are generating. Ideally, the marker would be positioned such that less than one percent of the control files background falls in the marker region.

To assess the relative level of fluorescence, consider the Now that we have generated statistics to quantify the level of positive expression, this really is a good time to deal with the statistics most commonly used in flow cytometry. We'll come back to these histograms shortly, consider the statistics we've generated, and then work through another approach to histogram analysis.

STATISTICS :
Two factors which you should consider, prior to using statistics on your population of interest are:
        1. Sample size: the size of the target population under consideration should be large enough such that the precision of the statistics generated provide a reasonable estimate of the number of labelled cells of interest. The precision of the data is dependant on the number of cells counted, and as number of events increases the coefficient of variation of the estimate decreases.
        2. The affect the incorrect choice of statistic can have on the relevance of the numbers that are generated.

The most commonly used statistic in flow cytometry is the mean . There are two variants, which can be defined thus:
arithmetic mean = Sum (x1 to xn)/n,
geometric mean = (x1 x x2 x x3 ... x xn) nth root.
The arithmetic mean is well suited to the analysis of data that is collected on a linear scale, while as a general rule, the geometric mean is better suited for use with data collected with a logarithmic amplifier.
The arithmetic mean is easily skewed by small numbers of outlying events, while the geometric mean will often prove to be a good indicator of the central tendency of a population. However many researchers feel that the arithmetic mean of a population is a "truer" representation of the data and want to know if there are small numbers of outlying events.
The median is the other most widely used robust statistic, and provides a good indication of the central tendency of the population.
         Median = the 50
th percentile. The midpoint in a series of events arranged sequentially based on increasing intensity.
The median is the preferred statistic for many analyses as it will still yield meaningful results in a variety of situations where events accumulate in the end channels of the fluorescence histogram display.

Some more information regarding statistics can be found here
http://jcsmr.anu.edu.au/facslab/statistics.html

Look at the statistics that you generated earlier. Look at the difference between the arithmetic, geometric means and the median. Note that for different data there are variations in the values of the three statistics which have been generated, and now think about which statistic is the most appropriate to use, given what you have just read above..

While the statistics generated are valid for the data, there is a better, less subjective manner in which to approach this. To do so, we have to use a method known as histogram subtraction which is designed to calculate the "true" percentage of positively labelled cells, and also the relative shift in a test sample compared to the control.

Software packages such as Winlist ( Verity Software House ) can be used to subtract one histogram, the control, from the sample histogram.
Winlist Histogram subtraction
You should see a small newly created histogram in the window. This new histogram represents the differences between the control and the test sample for the parameter being assessed. As you are not choosing to set any bounds in a subjective manner it is a simple, objective measure of the differences between the two populations. The numbers in the small statistics box show the differences between the two populations. This type of data, where the test sample shows a comparitively small shift in the relative fluorescence, is well suited to histogram subtraction, as opposed to a "set marker" type approach, as one has difficulty with this type of experiment in clearly deciding where a marker could, or should be set.

Next, compare the result of a histogram subtraction using CELLQuest, and Flowjo , working with the same two data files which were overlayed earlier. CELLQuest Histogram subtraction
Flowjo Histogram subtraction

If all went to plan, then the histogram overlay you were working with should now show an additional line which represents the difference between the two histograms. Note that by altering the order of the data files in the histogram tools prior to selection for histogram subtraction, the opposite subtraction to the one you have just completed would be done.
Consider the percentage positive cells that each method reports, 52.25% for CELLQuest, and 52.2% for Flowjo, now compare this results with that obtained manual in the set marker approach. The manual method above shows 56.18% positive, and if we subtract the 4.10% which fell in the marker region initially then the resultant manual percentage of positive cells is 52.18%, which is very close to the subtracted result.

If you do histogram analysis regularly, and you would normally subjectively set markers, look to see if this subtraction feature is available in your software. If it is available, try utilising it in typical analyses and see how it compares with your current approach. It is worth noting that histogram subtraction is not widely used in the published literature, for a number of reasons, some of which are historical, and that this method may be critised by some reviewers. Keep this in mind when using this method and create your confidence in this method prior to its use.

Further reading:
"Introduction to Flow Cytometry" James V. Watson. Cambridge University Press ISBN 0 521 38061 8
and an oldie but a goody
"Flow Cytometry: Instrumentation and Data Analysis", edited by Marvin A. Van Dilla, Phillip N. Dean, Ole D. Laerum, and Myron R. Melamed, Academic Press, 1985.. ISBN 012 712150 1
Other interesting reading on histogram analysis,
Lampariello F. (et al) Cytometry 1994; 15:294-301, & Cytometry 1998; 32:241-254, Cytometry 2000;38:179-188,
Watson J.V. Cytometry 2001; 43:55-68

Click here to continue with Contour and density plots:

or be a "glutton for punishment" and

Click here to continue with Regions and Gates.

FACS HOME