I have multiple entries with the same ID variable. I'm using proc summary to keep the max of each of them. The code example is below. For the variables in the macro variable fvar specified in the var statement, the max is applied, but what happens to the variables in the id statement (macro variable &other. The variables in other are character variables with qualitative data. If there is a missing value and a non missing I would like to keep the non missing, but what happens when both are non missing, which one is kept?
proc summary data=bva1 nway; class ID; var &fvar.; id &other; output out=bva2 max=;
@Satori ,
I'm not sure you understand what you are asking for here. Let's say you have three ID variables:
id idvar1 idvar2 idvar3;
The values in the output data set are NOT the maximum values of each. (Yes, that would give you the non-missing, if there is both a missing and a non-missing.) But the values selected for ID variables are computed sequentially. https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n1niocuk07q2wnn16mz96fboupgc.htm
This means you get the maximum of idvar1. But you don't necessarily get the maximum of idvar2 or idvar3. Instead, SAS selects in this manner. For all observations that have the maximum value of idvar1, what is the maximum value of idvar2. There may be observations that have a lower value for idvar1, and a higher value for idvar2. SAS doesn't consider those. It only looks at observations having the maximum value for idvar1, when selecting the maximum value for idvar2. So it's the last observation per ID (your CLASS variable) if the data were sorted by ID idvar1 idvar2 idvar3.
The documentation explains:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/proc/n1niocuk07q2wnn16mz96fboupgc.htm
When you specify only one variable in the ID statement, the value of the ID variable for a given observation is the maximum (minimum) value found in the corresponding group of observations in the input data set.
Use IDMIN in the PROC statement to include the minimum value of the ID variables in the output data set.
@Satori wrote:
What if the variables are character variables with text? What is the max/min of that?
Some things you can figure out for yourself. What is the minimum/maximum of:
Ant
Bear
Cougar
Dog
@Satori wrote:
What if the variables are character variables with text? What is the max/min of that?
Provide actual data.
Provide the actual code instead of hiding variable(s) behind macro variables.
If you do not understand the results, or they don't match what you need, describe what you expect/need/want.
ID variables in Proc Means provide additional output variables. What can depend on options in several places. By default (from the documentation)
When you specify only one variable in the ID statement, the value of the ID variable for a given observation is the maximum (minimum) value found in the corresponding group of observations in the input data set.
IDMIN on the proc statement gets the minimum value associated. But don't expect character values to behave nicely. Since "missing" is generally smaller than anything if your ID group as missing values they likely wont appear in the ID variables in the output unless IDMIN is set or all the Id values are missing fro the group.
Multiple ID variables and SAS goes into use of the multiple if there are "ties" in the first ID variable in the list.
It's all explained in the documentation I linked to. Its all very sensible, it does exactly what you would think it should do.
And again, this is simple enough for you to figure out yourself. So, why does the two PROC SUMMARY below give different results? Can you explain why the results are what they are?
proc summary nway data=sashelp.cars;
class origin;
var cylinders;
id make model;
output out=maxx max=;
run;
proc summary nway idmin data=sashelp.cars;
class origin;
var cylinders;
id make model;
output out=maxx2 max=;
run;
@Satori ,
I'm not sure you understand what you are asking for here. Let's say you have three ID variables:
id idvar1 idvar2 idvar3;
The values in the output data set are NOT the maximum values of each. (Yes, that would give you the non-missing, if there is both a missing and a non-missing.) But the values selected for ID variables are computed sequentially. https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n1niocuk07q2wnn16mz96fboupgc.htm
This means you get the maximum of idvar1. But you don't necessarily get the maximum of idvar2 or idvar3. Instead, SAS selects in this manner. For all observations that have the maximum value of idvar1, what is the maximum value of idvar2. There may be observations that have a lower value for idvar1, and a higher value for idvar2. SAS doesn't consider those. It only looks at observations having the maximum value for idvar1, when selecting the maximum value for idvar2. So it's the last observation per ID (your CLASS variable) if the data were sorted by ID idvar1 idvar2 idvar3.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.