Hi All,
I am wondering if its possible to solve the following problem in SAS:
I have a dataset as follows -
id | var1 | var2 |
8250 | 1 | 23987 |
8250 | 45 | 13841 |
8250 | 79 | 13223 |
8250 | 24 | 10031 |
8250 | 54 | 8849 |
8250 | 88 | 8284 |
8250 | 51 | 4733 |
8250 | 71 | 4109 |
14984 | 80 | 1695 |
14984 | 55 | 195 |
14984 | 83 | 184 |
14984 | 26 | 166 |
14984 | 84 | 141 |
14984 | 40 | 127 |
14984 | 17 | 122 |
14984 | 96 | 120 |
14984 | 10 | 96 |
I just would like to have a unique record per column id which takes the max value from var 2 and gives -
id | var1_2 |
8250 | 1 45 79 24 54 |
14984 | 80 55 83 26 84 |
Is thsi possible in Base SAS?
KR
SK
So you want the max five value for each ID ? data have; infile cards expandtabs truncover; input id var1 var2; datalines; 8250 1 23987 8250 51 4733 8250 45 13841 8250 71 4109 8250 24 10031 8250 54 8849 8250 88 8284 8250 79 13223 14984 10 96 14984 55 195 14984 26 166 14984 84 141 14984 40 127 14984 83 184 14984 17 122 14984 80 1695 14984 96 120 ; run; proc sort data=have; by id descending var2; run; data temp; set have; by id; if first.id then n=0; n+1; if n le 5; run; data want; length var1_2 $ 2000; do until(last.id); set temp; by id; var1_2=catx(' ',var1_2,var1); end; keep id var1_2; run;
[1] Your input data seems to be sorted both ID and VAR2. Here is a solution which requires data to be sorted by ID alone.
[2] You seem to want the first 5 largest values of Var2, although it is not mentioned. The present solution can handle as many values as you want.
[3] An array approch is used. It requires the largest ID-Group size which is 9 in your data. Again this size can be done programmatically. Proc Sql may be used but I use a data step which will be faster than SQL when data set is large.
[4] Your input data set is slightly disordered by Var2 within ID.
Here is the program:
data have;
input id var1 var2;
datalines;
8250 1 23987
8250 51 4733
8250 45 13841
8250 71 4109
8250 24 10031
8250 54 8849
8250 88 8284
8250 79 13223
14984 10 96
14984 55 195
14984 26 166
14984 84 141
14984 40 127
14984 83 184
14984 17 122
14984 80 1695
14984 96 120
;
run;
data _null_;
retain max 0;
do i = 1 by 1 until(last.id);
set have end = eof;
by id;
end;
if i > max then max = i;
if eof then call symputx('groupsize', max);
run;
%let h = 5;
data want;
array k[&groupsize] _temporary_;
array m[&groupsize] _temporary_;
length var1_2 $50;
do i = 1 by 1 until(last.id);
set have;
by id;
k[i] = var2;
m[i] = var1;
if first.id then call missing(var1_2);
end;
call catx(' ', var1_2, id);
do i = 1 to &h;
b = largest(i, of k[*]);
v = m[whichn(b, of k[*])];
call catx(' ', var1_2, v);
end;
keep var1_2;
run;
I hope you will not have any problem to understand the program and this will be an acceptable solution to you.
Edited . . . . .
In the above program ID goes as part of new variable, VAR1_2. If you don't want it, delete
call catx(' ', var1_2, id);
and change the last statement to
keep ID var1_2;
If you don't want a general program but just a simple program for the given data(where ID and VAR2 are sorted), few statements will do as in:
data want1;
do i = 1 by 1 until(last.id);
set have;
by id;
length var1_2 $50;
if i < 6 then call catx(' ', var1_2, var1);
else continue;
end;
keep id var1_2;
run;
So you want the max five value for each ID ? data have; infile cards expandtabs truncover; input id var1 var2; datalines; 8250 1 23987 8250 51 4733 8250 45 13841 8250 71 4109 8250 24 10031 8250 54 8849 8250 88 8284 8250 79 13223 14984 10 96 14984 55 195 14984 26 166 14984 84 141 14984 40 127 14984 83 184 14984 17 122 14984 80 1695 14984 96 120 ; run; proc sort data=have; by id descending var2; run; data temp; set have; by id; if first.id then n=0; n+1; if n le 5; run; data want; length var1_2 $ 2000; do until(last.id); set temp; by id; var1_2=catx(' ',var1_2,var1); end; keep id var1_2; run;
You seemed to have not gone through my answers. Just see the end of my answer.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.