- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I have multiple entries with the same ID variable. I'm using proc summary to keep the max of each of them. The code example is below. For the variables in the macro variable fvar specified in the var statement, the max is applied, but what happens to the variables in the id statement (macro variable &other. The variables in other are character variables with qualitative data. If there is a missing value and a non missing I would like to keep the non missing, but what happens when both are non missing, which one is kept?
proc summary data=bva1 nway; class ID; var &fvar.; id &other; output out=bva2 max=;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Satori ,
I'm not sure you understand what you are asking for here. Let's say you have three ID variables:
id idvar1 idvar2 idvar3;
The values in the output data set are NOT the maximum values of each. (Yes, that would give you the non-missing, if there is both a missing and a non-missing.) But the values selected for ID variables are computed sequentially. https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n1niocuk07q2wnn16mz96fboupgc.htm
This means you get the maximum of idvar1. But you don't necessarily get the maximum of idvar2 or idvar3. Instead, SAS selects in this manner. For all observations that have the maximum value of idvar1, what is the maximum value of idvar2. There may be observations that have a lower value for idvar1, and a higher value for idvar2. SAS doesn't consider those. It only looks at observations having the maximum value for idvar1, when selecting the maximum value for idvar2. So it's the last observation per ID (your CLASS variable) if the data were sorted by ID idvar1 idvar2 idvar3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The documentation explains:
https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.4/proc/n1niocuk07q2wnn16mz96fboupgc.htm
When you specify only one variable in the ID statement, the value of the ID variable for a given observation is the maximum (minimum) value found in the corresponding group of observations in the input data set.
Use IDMIN in the PROC statement to include the minimum value of the ID variables in the output data set.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Satori wrote:
What if the variables are character variables with text? What is the max/min of that?
Some things you can figure out for yourself. What is the minimum/maximum of:
Ant
Bear
Cougar
Dog
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Satori wrote:
What if the variables are character variables with text? What is the max/min of that?
Provide actual data.
Provide the actual code instead of hiding variable(s) behind macro variables.
If you do not understand the results, or they don't match what you need, describe what you expect/need/want.
ID variables in Proc Means provide additional output variables. What can depend on options in several places. By default (from the documentation)
When you specify only one variable in the ID statement, the value of the ID variable for a given observation is the maximum (minimum) value found in the corresponding group of observations in the input data set.
IDMIN on the proc statement gets the minimum value associated. But don't expect character values to behave nicely. Since "missing" is generally smaller than anything if your ID group as missing values they likely wont appear in the ID variables in the output unless IDMIN is set or all the Id values are missing fro the group.
Multiple ID variables and SAS goes into use of the multiple if there are "ties" in the first ID variable in the list.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
It's all explained in the documentation I linked to. Its all very sensible, it does exactly what you would think it should do.
And again, this is simple enough for you to figure out yourself. So, why does the two PROC SUMMARY below give different results? Can you explain why the results are what they are?
proc summary nway data=sashelp.cars;
class origin;
var cylinders;
id make model;
output out=maxx max=;
run;
proc summary nway idmin data=sashelp.cars;
class origin;
var cylinders;
id make model;
output out=maxx2 max=;
run;
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
@Satori ,
I'm not sure you understand what you are asking for here. Let's say you have three ID variables:
id idvar1 idvar2 idvar3;
The values in the output data set are NOT the maximum values of each. (Yes, that would give you the non-missing, if there is both a missing and a non-missing.) But the values selected for ID variables are computed sequentially. https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/proc/n1niocuk07q2wnn16mz96fboupgc.htm
This means you get the maximum of idvar1. But you don't necessarily get the maximum of idvar2 or idvar3. Instead, SAS selects in this manner. For all observations that have the maximum value of idvar1, what is the maximum value of idvar2. There may be observations that have a lower value for idvar1, and a higher value for idvar2. SAS doesn't consider those. It only looks at observations having the maximum value for idvar1, when selecting the maximum value for idvar2. So it's the last observation per ID (your CLASS variable) if the data were sorted by ID idvar1 idvar2 idvar3.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content