Hello
In the following data set there are 6 rows for each ID.
I need to create a data set that contain 1 row for each customer and have information of #months that Y was lower than 3000.
Expected results:
for ID=1 we will get 0
for ID=1 we will get 1
for ID=1 we will get 4
What is the way to do it please?
Data tbl;
input ID Y;
cards;
1 3000
1 3400
1 3100
1 3200
1 3300
1 3300
2 3500
2 2900
2 3100
2 3100
2 3200
2 3300
3 3200
3 2900
3 2800
3 2800
3 2950
3 3200
;
run;
proc sql;
create table want as
select ID, sum(Y<3000) as Ylower3000
from tbl
group by ID;
quit;
Result:
ID Ylower3000 1 0 2 1 3 4
What is the data set way to do it?
I know to do it via proc sql but don't know via data set way
You can do it with a data step like this
data want(drop=Y);
do until (last.id);
set tbl;
by id;
if Y < 3000 then Ylower3000+1;
end;
output;
Ylower3000=0;
run;
Of course, this approach relies on the input data to be sorted by ID. If it is not, then you can do something like this
Data tbl;
input ID Y;
cards;
1 3000
1 3200
1 3300
1 3300
3 2800
3 2800
3 2950
3 3200
2 3500
2 2900
2 3100
2 3100
2 3200
1 3400
1 3100
2 3300
3 3200
3 2900
;
run;
data _null_;
if _N_=1 then do;
declare hash h(ordered:'Y');
h.definekey('ID');
h.definedata('ID', 'Ylower3000');
h.definedone();
end;
set tbl end=lr;
if h.check() ne 0 then do;
Ylower3000=0;
h.add();
end;
if Y<3000 then do;
rc=h.find();
Ylower3000+1;
rc=h.replace();
end;
if lr then h.output(dataset:'want');
run;
Hi,
Just for fun. I think that direct addressing also will do in this case 🙂
Bart
Data tbl;
input ID Y;
cards;
1 3000
1 3200
1 3300
1 3300
3 2800
3 2800
3 2950
3 3200
2 3500
2 2900
2 3100
2 3100
2 3200
1 3400
1 3100
2 3300
3 3200
3 2900
5 3200
5 2900
;
run;
%let size = 10000;
data want;
array t[&size.] _temporary_;
min = &size.; max = 0;
do until(eof);
set tbl end=eof;
t[id] + (Y<3000);
min = min><id;
max = max<>id;
end;
do id = min to max;
Ylower3000 = t[id];
put _all_;
if Ylower3000 > .z then output;
end;
stop;
keep id Ylower3000;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.