Hi SAS Users,
Yesterday, I learned how to tag along a variable to proc means by using ID statement as suggested by @FreelanceReinh . However, the description for this one is quite ambiguous (without example).
Therefore, I did another search and see there is a discussion having the example and quite nice.
I use the sashelp.class dataset for this example
The dataset is:
Name Sex Age Height Weight
Joyce F 11 51.3 50.5
Thomas M 11 57.5 85
Jane F 12 59.8 84.5
Louise F 12 56.3 77
James M 12 57.3 83
John M 12 59 99.5
Robert M 12 64.8 128
Alice F 13 56.5 84
Barbara F 13 65.3 98
Jeffrey M 13 62.5 84
Carol F 14 62.8 102.5
Judy F 14 64.3 90
Alfred M 14 69 112.5
Henry M 14 63.5 102.5
Janet F 15 62.5 112.5
Mary F 15 66.5 112
Ronald M 15 67 133
William M 15 66.5 112
Philip M 16 72 150
My code is as below
data have;
set sashelp.class;
run;
proc sort data= have;
by age;
run;
proc means data=have;
by age;
var Height Weight;
id Sex;
output out=means(drop =_type_ _freq_);
run;
As what I understand, The value of variable Sex will follow the value of variable Age (if there are two vars in IDs, the second var will follow the maximum value of the first var (same row))
identifies one or more variables from the input data set whose maximum values for groups of observations PROC MEANS includes in the output data set
note that "maximum" refers to alphabetical order in case of character variables such as LOC
But, the result from running the code above is
I have two curiousities here:
1. I confused because the value shown in Sex should be "F" rather than "M:" (because 'F' standing in front of 'M' in alphabet. Even it does not relate to the calculation in general, but because I want to apply in my case with the variable LOC, so I need things precisely.
2. The ID statement seems to be efficiently used in a case that the character variables are constant over time like in my previous topic?
Warmest regards.
Hi @Phil_NZ,
@Phil_NZ wrote:
1. I confused because the value shown in Sex should be "F" rather than "M:" (because 'F' standing in front of 'M' in alphabet.
No, this is not how the order of character variables is defined in SAS. For SAS 'A' < 'B' < 'C' < ... like 1 < 2 < 3 < ... More precisely, SAS (under Windows) uses the underlying numeric ASCII codes to define the order of character values. As a consequence, lowercase letters are "greater than" uppercase letters (e.g., 'Z' < 'a' is true), which in turn are "greater than" digits. See an ASCII table or play with the RANK or BYTE function to find out more details.
So, the "maximum" in the context of the ID statement in PROC MEANS refers to the value that comes last according to the ASCII collating sequence. In your example each of the age groups contains at least one male person and since 'M' > 'F' all age groups are "identified" by Sex='M'. Be creative and modify the example by assigning random letters (combine the BYTE and RAND('integer',...) functions) to variable Sex and see how the results change.
RTM.
ID -> identifies one or more variables from the input data set whose maximum values for groups of observations PROC MEANS includes in the output data set.
You likely need to remerge your data, ID identifies the SEX with the maximum value. If the value is not consistent within the BY groups and you're expecting the value from the original data then you must remerge instead of use ID.
Or look at the IDMIN or PRINTIDVARS option on the PROC statement as the documentation suggests.
How to remerge:
https://github.com/statgeek/SAS-Tutorials/blob/master/add_average_value_to_dataset.sas
Documentation
https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.5&docsetId=proc&docsetTarget=n1niocu...
PROC SUMMARY offers some other ID options but in general, I prefer to just remerge as it's less testing or worrying.
Hi @Reeza
Many thanks for your suggestion
Let's say about the github solution, I deal with the Solution 1
*Add average value to a dataset;
*Solution 1 - PROC MEANS + Data step;
******************************************************;
proc means data=sashelp.class noprint;
output out=avg_values mean(height)=avg_height;
run;
data class_data;
set sashelp.class;
if _n_=1 then
set avg_values;
run;
proc print data=class_data;/* I adjusted this one from class to class_data because dataset class does not exist*/
run;
The result is as below:
It is a desired result, but I do not know how SAS put avg_height to every row like that. What I focus on is
data class_data;
set sashelp.class;
if _n_=1 then
set avg_values;
run;
From such a code, we only have one time that _n_1 at the first iteration. So, avg_height should only display at the first row, and "." afterwards.
I know that my imagined running maybe wrong but could you please explain it to me?
Apart from that, could you please tell me what RTM stand for?
Warmest regards.
@Phil_NZ wrote:
data class_data; set sashelp.class; if _n_=1 then set avg_values; run;
From such a code, we only have one time that _n_1 at the first iteration. So, avg_height should only display at the first row, and "." afterwards.
Remember that variables read with a SET statement (here: avg_height from the second SET statement) are automatically retained, i.e., they are not reset to missing when the DATA step iterates. See section Redundancy in the documentation of the RETAIN statement.
Hi @Phil_NZ,
@Phil_NZ wrote:
1. I confused because the value shown in Sex should be "F" rather than "M:" (because 'F' standing in front of 'M' in alphabet.
No, this is not how the order of character variables is defined in SAS. For SAS 'A' < 'B' < 'C' < ... like 1 < 2 < 3 < ... More precisely, SAS (under Windows) uses the underlying numeric ASCII codes to define the order of character values. As a consequence, lowercase letters are "greater than" uppercase letters (e.g., 'Z' < 'a' is true), which in turn are "greater than" digits. See an ASCII table or play with the RANK or BYTE function to find out more details.
So, the "maximum" in the context of the ID statement in PROC MEANS refers to the value that comes last according to the ASCII collating sequence. In your example each of the age groups contains at least one male person and since 'M' > 'F' all age groups are "identified" by Sex='M'. Be creative and modify the example by assigning random letters (combine the BYTE and RAND('integer',...) functions) to variable Sex and see how the results change.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.