09-28-2015 10:32 PM
Given a dataset sorted by PID (ID of each participant, N=77) with several continuous variables.
I want to obtain which are the observations (PID) with values > 2 SD from the mean.
I am working with data in a temporary dataset called -- cortical_stroke_complete2
The variable of interest is : PCh_CONTRA_entorhinal_tck
This is the code that I have been trying without success, I tried different variables and always obtained the same PID.
proc sql noprint;
select mean(PCh_CONTRA_entorhinal_tck) into : mean from cortical_stroke_complete2;
select std(PCh_CONTRA_entorhinal_tck) into :std from cortical_stroke_complete2;
where PCh_CONTRA_entorhinal_tck >&mean+2*&std ;
Basic SAS programmer.
Thank you so much!
09-28-2015 11:03 PM
No need for macros to do that. Try this
proc sql; select * from cortical_stroke_complete2 having PCh_CONTRA_entorhinal_tck - mean(PCh_CONTRA_entorhinal_tck) > 2*std(PCh_CONTRA_entorhinal_tck); quit;
09-29-2015 07:50 PM
Getting the same PID each time does not mean anything is wrong. It is possible that only one patient has outlier values.
When looking for values more than 2 standard deviations from the mean, you may have to consider two standard deviations in both directions.
where (PCh_Contra_entorhinal > &mean + 2*&std) or (PCh_Contra_entorhinal < &mean - 2*&std);
If you suspect this is not giving you the right result, print &mean and &std, and inspect 20 lines of data to see if you can confirm whether the result should be different.