- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
dear Community members,
Happy 2014! Extracting descriptive statistics is among the very first steps of many statistical analyses. I am trying to automate the routine of getting basic measures, such as mean, median, standard deviation and IQR from PROC UNIVARIATE. Unfortunately I got stuck when attempting to extracting data from an ods object called 'BasicMeasures.' By default you will get the following output when using ods output BasicMeasures=:
where the first few columns are your BY-group variables, followed by four basic measure columns- LocMeasure (the highlighted one), LocValue, VarName and VarValue. I would like to create one subset containing the BY-groups+ LocMeasure+ LocValue and one subset containing BY-groups+ VarName and VarValue. As shown in the screenshot of the column attributes, the column name I want to extract is correctly identified but I got an error message 'ERROR: variable LocMeasure and LocValue are not found in the work file. Same error occurs for the other two variables VarName and VarValue. I suspect this may be due to the conflict between the 3 data set options KEEP=, WHERE= and RENAME= in my SET statement, because my codes are working when RENAME= is removed. But I have to rename the two columns in the two subsets so I can stack them later. Any idea? Here are my codes
*step 1: sort data*;
proc sort data=OA_MT_wide; by scoreType scoreValue scoreOrder; run;
*step 2: output basic measures*;
proc univariate data= OA_MT_wide normal;
by scoreType scoreValue scoreOrder ;
class scoreOrder;
var age;
ods output TestsForNormality= _normality_age;
ods output BasicMeasures= _basic_age;
run;
*step 3: get mean, median, standard deviation and IQR and group them into two columns- statType and statValue*;
data basic_age; set
_basic_age(keep = scoreType scoreValue scoreOrder LocMeasure LocValue
in=a
where=(LocMeasure in ('Mean' 'Median'))
rename=(LocMeasure= statType LocValue= statValue))
_basic_age(keep = scoreType scoreValue scoreOrder VarName VarValue
in=b
where=(VarName in ('Std Deviation' 'Interquartile Range'))
rename=(VarName= statType VarValue= statValue))
;
RUN;
My data are in the excel worksheet 'data'. My codes are also attached (normality_short.sas) for your convenience. Many thanks in advance.rder
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Change your WHERE data set options to use the RENAMEd variable names. Where=(statType in('Mean' ...
I would look to PROC SUMMARY to create data sets of summary statistics, should be much easier.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thank you very much. In proc summary, each basic measure can be explicitly specified and this also produce a wide-format output.