Hi,
I'm having trouble creating a new variable.
Imagine this fictive dataset:
ID X Y Z
1 23 3 1
2 12 2 0
3 10 3 0
1 80 6 1
2 09 2 0
1 78 7 1
4 14 6 0
5 67 9 1
6 12 5 1
1 89 4 0
I want to create a new variable named 'count' that counts how many times the same ID number appears. Besides that, the new variable 'count' should only count those ID numbers with the variable Z = 1. So in this dataset the variable 'Count' would look like:
ID X Y Z COUNT
1 23 3 1 3
2 12 2 0 0
3 10 3 0 0
1 80 6 1 3
2 09 2 0 0
1 78 7 1 3
4 14 6 0 0
5 67 9 1 1
6 12 5 1 1
1 89 4 0 3
Thanks!
data have;
input ID X Y Z;
datalines;
1 23 3 1
2 12 2 0
3 10 3 0
1 80 6 1
2 09 2 0
1 78 7 1
4 14 6 0
5 67 9 1
6 12 5 1
1 89 4 0
;
data want (drop = rc);
if _N_ = 1 then do;
declare hash h ();
h.definekey ('ID');
h.definedata ('ID', 'count');
h.definedone();
do until (lr);
set have end=lr;
if Z = 1 then do;
if h.find () ne 0 then count = 1;
else count + 1;
h.replace ();
end;
end;
end;
set have;
count = 0;
rc = h.find();
run;
Result:
ID X Y Z count 1 23 3 1 3 2 12 2 0 0 3 10 3 0 0 1 80 6 1 3 2 9 2 0 0 1 78 7 1 3 4 14 6 0 0 5 67 9 1 1 6 12 5 1 1 1 89 4 0 3
data have;
input ID X Y Z;
datalines;
1 23 3 1
2 12 2 0
3 10 3 0
1 80 6 1
2 09 2 0
1 78 7 1
4 14 6 0
5 67 9 1
6 12 5 1
1 89 4 0
;
data want (drop = rc);
if _N_ = 1 then do;
declare hash h ();
h.definekey ('ID');
h.definedata ('ID', 'count');
h.definedone();
do until (lr);
set have end=lr;
if Z = 1 then do;
if h.find () ne 0 then count = 1;
else count + 1;
h.replace ();
end;
end;
end;
set have;
count = 0;
rc = h.find();
run;
Result:
ID X Y Z count 1 23 3 1 3 2 12 2 0 0 3 10 3 0 0 1 80 6 1 3 2 9 2 0 0 1 78 7 1 3 4 14 6 0 0 5 67 9 1 1 6 12 5 1 1 1 89 4 0 3
Also using a hash, but with a preliminary step to create the sums:
data have;
input ID $ X Y Z;
datalines;
1 23 3 1
2 12 2 0
3 10 3 0
1 80 6 1
2 09 2 0
1 78 7 1
4 14 6 0
5 67 9 1
6 12 5 1
1 89 4 0
;
proc summary data=have (where=(z = 1));
class id;
output out=sum (drop=_type_ rename=(_freq_=count));
run;
data want;
set have;
if 0 then set sum;
if _n_ = 1
then do;
declare hash h (dataset:"sum");
h.definekey("id");
h.definedata("count");
h.definedone();
end;
if h.find() ne 0 then count = 0;
run;
Note that the rather complicated algorithms are only necessary to keep the original order. If that is not the case, a simple SQL will do it:
proc sql;
create table want as
select *, sum(case when z = 1 then 1 else 0 end) as count
from have
group by id;
quit;
And you can keep the original order if you create a column to order by first:
data have;
input ID $ X Y Z;
ord = _n_;
datalines;
1 23 3 1
2 12 2 0
3 10 3 0
1 80 6 1
2 09 2 0
1 78 7 1
4 14 6 0
5 67 9 1
6 12 5 1
1 89 4 0
;
proc sql;
create table want as
select id, x, y, z, sum(case when z = 1 then 1 else 0 end) as count
from have
group by id
order by ord;
quit;
data have;
input ID X Y Z;
datalines;
1 23 3 1
2 12 2 0
3 10 3 0
1 80 6 1
2 09 2 0
1 78 7 1
4 14 6 0
5 67 9 1
6 12 5 1
1 89 4 0
;
proc sql;
create table want as
select *,
coalesce((select count(*) from have where z=1 and id=a.id),0) as count
from have as a;
quit;
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.