Using the following dataset for illustration:
data test;
input id $ x y;
cards;
a1 1 2
a1 1 4
a1 2 8
a1 2 16
b1 2 2
b1 2 4
b1 2 8
b1 3 16
b1 4 32
;
run;
I'd like to retain all records sharing the lowest values of x by subject id:
id x y
a1 1 2
a1 1 4
b1 2 2
b1 2 4
b1 2 8
There's a simple solution I'm missing -- tried the following code but it didn't work:
data test2;
set test;
by id x;
if first.x;
run;
Since your data are sorted by ID and X, when you read the first record for an ID-group, you know it will have the minimum value for x.
data test2;
set test;
by id x;
retain _xmin ;
if first.id then _xmin=x;
if x=_xmin ;
*drop _xmin ;
run;
Of course if you have missing values for X, this won't work because they will be the minimum value.
proc summary data=test nway;
class id;
var x;
output out=stats min=min_x;
run;
data want;
merge test stats(drop=_:);
by id;
if x=min_x;
run;
Thanks, that worked!
So it's not a matter of tweaking the syntax in my original attempt - looks like we have to find a work around or else use proc sql?
Since your data are sorted by ID and X, when you read the first record for an ID-group, you know it will have the minimum value for x.
data test2;
set test;
by id x;
retain _xmin ;
if first.id then _xmin=x;
if x=_xmin ;
*drop _xmin ;
run;
Of course if you have missing values for X, this won't work because they will be the minimum value.
Thanks Quentin!
@Quentin wrote:
Since your data are sorted by ID and X, when you read the first record for an ID-group, you know it will have the minimum value for x.
@RobertWF1 is that always the case that the data will be sorted like you show?
@PaigeMiller wrote:
@Quentin wrote:
Since your data are sorted by ID and X, when you read the first record for an ID-group, you know it will have the minimum value for x.
@RobertWF1 is that always the case that the data will be sorted like you show?
If it isn't sorted, the
by id x;
statement will throw an error. I often include "extra" variables on the BY statement like this, just to confirm that the data are sorted as the logic expects. There is an efficiency cost to this, because the DATA step will create first.x and last.x which are not needed, but the efficiency loss is worth it to me to know that the data are sorted, and know I will get and error rather than a wrong result if the data are not sorted.
Just having some fun for posting a PROC SQL solution:
data test;
input id $ x y;
cards;
a1 1 2
a1 1 4
a1 2 8
a1 2 16
b1 2 2
b1 2 4
b1 2 8
b1 3 16
b1 4 32
;
run;
proc sql;
create table want as
select * from test group by id having x=min(x);
quit;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.