Hi!
Given a dataset sorted by PID (ID of each participant, N=77) with several continuous variables.
I want to obtain which are the observations (PID) with values > 2 SD from the mean.
I am working with data in a temporary dataset called -- cortical_stroke_complete2
The variable of interest is : PCh_CONTRA_entorhinal_tck
This is the code that I have been trying without success, I tried different variables and always obtained the same PID.
proc sql noprint;
select mean(PCh_CONTRA_entorhinal_tck) into : mean from cortical_stroke_complete2;
select std(PCh_CONTRA_entorhinal_tck) into :std from cortical_stroke_complete2;
quit;
data cortical_stroke_complete2;
set cortical_stroke_complete2;
where PCh_CONTRA_entorhinal_tck >&mean+2*&std ;
proc print;run;
Basic SAS programmer.
Thank you so much!
No need for macros to do that. Try this
proc sql;
select *
from cortical_stroke_complete2
having PCh_CONTRA_entorhinal_tck - mean(PCh_CONTRA_entorhinal_tck) > 2*std(PCh_CONTRA_entorhinal_tck);
quit;
(untested)
Getting the same PID each time does not mean anything is wrong. It is possible that only one patient has outlier values.
When looking for values more than 2 standard deviations from the mean, you may have to consider two standard deviations in both directions.
where (PCh_Contra_entorhinal > &mean + 2*&std) or (PCh_Contra_entorhinal < &mean - 2*&std);
If you suspect this is not giving you the right result, print &mean and &std, and inspect 20 lines of data to see if you can confirm whether the result should be different.
Good luck.
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.