Obsidian | Level 7

## Analysis of Negatively skewed nested data

Hi,

I need help analyzing my data which is negatievly skewed (skewness=-2.5 approx) with around 35% data at 0. My experiment is : Each person scanned under diffrent cases, 3 trails and each trial produces 12 scans on a person. So I clearly have nested structure. I tried fitting gamma and lognormal distributions to this data, but they all run into convergence issues. These are residuals from normal distribution fitting. Can anyone suggest what can I do better with this data.  Thank you so much.

``````title "Pelvic Lateral Deviation 504 analysis";
proc glimmix data=full_sta1 plots=all;
class case pt trial;
model  PlumbResult_0504_LateralDeviatio= case/ddfm=KR  ;
random  intercept/subject=pt(case) ;
random trial(pt*case);
run;``````

2 REPLIES 2
Diamond | Level 26

## Re: Analysis of Negatively skewed nested data

You could try something called a Box-Cox transformation which will transform the data to something approximately normally distributed, if such a transformation exists. This can be done in PROC TRANSREG (and maybe other procedures as well).See:https://documentation.sas.com/?docsetId=statug&docsetTarget=statug_odsgraph_sect010.htm&docsetVersio...

I would try this on the average for each person, rather than on the 3 trials x 12 scans for each person.

--
Paige Miller
Rhodochrosite | Level 12

## Re: Analysis of Negatively skewed nested data

Your description does not give us enough information to determine whether the statistical model is correct. For example, how many levels of CASE are there, and how does CASE relate to TRIAL? Are there 3 CASEs with one TRIAL each? Or 3 TRIALs for each CASE? What research question do you have that would be addressed by 12 SCANs in each TRIAL?

Your residual plot and data plot show that there is an upper bound (which is zero) to your response "PlumbResult_0504_LateralDeviatio". Neither the lognormal nor the gamma distribution is appropriate for data with an upper bound; both the lognormal and the gamma have a lower bound at zero and an upper bound of infinity. Both should have failed miserably with a response with negative values (the log of zero is not defined, and I would guess that there was a message to that effect in the log window; always pay attention to the log window).

So, we need to know more about what your response is measuring, in addition to more about your experimental design. Guessing wildly, you might have more luck redefining your response as (-1)*response, if that was sensible in context; that redefined response might follow the exponential distribution, and then the gamma might work (although gamma mixed models can be very persnickety).