01-29-2014 09:24 PM
I'm trying to work out the best way to model/analyze some data. The data come from performing digital image analysis of a stained tissue on a microscope slide. The analysis uses a non-overlapping, uniform sampling window which reports the % of the window above threshold. So, for instance, each tissue slide will have recorded approximately 20,000 values. I've looked at the data using proc univariate and plotted either the cdfplot or probplot. Another way it could be looked at is with a histogram, but there are a majority of zero values so it is not as informative to look at (to me anyway). I've provided a probplot for some of the data below. As seen in the bottom panel (tissue G4419), the amount of staining (i.e., % above threshold) is related to the antibody concentration. The distribution is related to the fact that windowing areas naturally do not have equal amounts of the antibody target. IN fact, an important question is to determine at what antibody concentration are we losing important staining. So I think a first question is: can this distribution relationship with antibody concentration be modeled?
Second question: clearly the distributions are greatly different between tissue G4412 (upper) and G4419 (lower) (each representing different disease state. How do I statistically compare these? Can I compute the quantile for each that is below threshold, show a difference in proportions of data within any windowing percentage I might choose (like max, half-max, or 90-100% or 10-25%)? I keep thinking the solution might be Proc Quantreg but I'm not sure and have never implemented that. Have also wondered what constraints the type of data might have; i.e., the fact that each measure is a percentage. Is this like a survival analysis?
I look forward to learning from your suggestions.
01-30-2014 09:22 AM
Yum. This looks like a tasty problem (and one that I might have to be addressing soon myself with IHC data).
I think QUANTREG might be a good start. You may have to transform the percent area stained if some of the values are near zero, as QUANTREG does assume that errors are normally distributed. One of the good things about QUANTREG is that you can fit for various quantiles, and then do a test of homogeneity across quantiles.