BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
madsgregers
Calcite | Level 5

Hi,

 

I'm having trouble creating a new variable. 

Imagine this fictive dataset: 

 

ID     X     Y     Z

1     23     3     1

2     12     2     0

3     10     3     0

1     80     6     1

2     09     2     0

1     78     7     1

4     14     6     0

5     67     9     1

6     12     5     1

1     89     4     0

 

I want to create a new variable named 'count' that counts how many times the same ID number appears. Besides that, the new variable 'count' should only count those ID numbers with the variable Z = 1. So in this dataset the variable 'Count' would look like:

 

ID     X     Y     Z     COUNT

1     23     3     1     3

2     12     2     0     0

3     10     3     0     0

1     80     6     1     3

2     09     2     0     0

1     78     7     1     3

4     14     6     0     0

5     67     9     1     1

6     12     5     1     1

1     89     4     0     3

 

Thanks! 

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20
data have;
input ID X Y Z;
datalines;
1 23 3 1
2 12 2 0
3 10 3 0
1 80 6 1
2 09 2 0
1 78 7 1
4 14 6 0
5 67 9 1
6 12 5 1
1 89 4 0
;

data want (drop = rc);
   if _N_ = 1 then do;
      declare hash h ();
      h.definekey ('ID');
      h.definedata ('ID', 'count');
      h.definedone();

      do until (lr);
         set have end=lr;
         if Z = 1 then do;
            if h.find () ne 0 then count = 1;
            else                   count + 1;
            h.replace ();
         end;
      end;
   end;

   set have;

   count = 0;
   rc = h.find();
run;

Result:

 

ID  X   Y  Z  count 
1   23  3  1  3 
2   12  2  0  0 
3   10  3  0  0 
1   80  6  1  3 
2   9   2  0  0 
1   78  7  1  3 
4   14  6  0  0 
5   67  9  1  1 
6   12  5  1  1 
1   89  4  0  3 

View solution in original post

5 REPLIES 5
PeterClemmensen
Tourmaline | Level 20
data have;
input ID X Y Z;
datalines;
1 23 3 1
2 12 2 0
3 10 3 0
1 80 6 1
2 09 2 0
1 78 7 1
4 14 6 0
5 67 9 1
6 12 5 1
1 89 4 0
;

data want (drop = rc);
   if _N_ = 1 then do;
      declare hash h ();
      h.definekey ('ID');
      h.definedata ('ID', 'count');
      h.definedone();

      do until (lr);
         set have end=lr;
         if Z = 1 then do;
            if h.find () ne 0 then count = 1;
            else                   count + 1;
            h.replace ();
         end;
      end;
   end;

   set have;

   count = 0;
   rc = h.find();
run;

Result:

 

ID  X   Y  Z  count 
1   23  3  1  3 
2   12  2  0  0 
3   10  3  0  0 
1   80  6  1  3 
2   9   2  0  0 
1   78  7  1  3 
4   14  6  0  0 
5   67  9  1  1 
6   12  5  1  1 
1   89  4  0  3 
Kurt_Bremser
Super User

Also using a hash, but with a preliminary step to create the sums:

data have;
input ID $ X Y Z;
datalines;
1     23     3     1
2     12     2     0
3     10     3     0
1     80     6     1
2     09     2     0
1     78     7     1
4     14     6     0
5     67     9     1
6     12     5     1
1     89     4     0
;

proc summary data=have (where=(z = 1));
class id;
output out=sum (drop=_type_ rename=(_freq_=count));
run;

data want;
set have;
if 0 then set sum;
if _n_ = 1 
then do;
  declare hash h (dataset:"sum");
  h.definekey("id");
  h.definedata("count");
  h.definedone();
end;
if h.find() ne 0 then count = 0;
run;
madsgregers
Calcite | Level 5
Thanks a lot to both of you!
Kurt_Bremser
Super User

Note that the rather complicated algorithms are only necessary to keep the original order. If that is not the case, a simple SQL will do it:

proc sql;
create table want as
select *, sum(case when z = 1 then 1 else 0 end) as count
from have
group by id;
quit;

And you can keep the original order if you create a column to order by first:

data have;
input ID $ X Y Z;
ord = _n_;
datalines;
1     23     3     1
2     12     2     0
3     10     3     0
1     80     6     1
2     09     2     0
1     78     7     1
4     14     6     0
5     67     9     1
6     12     5     1
1     89     4     0
;

proc sql;
create table want as
select id, x, y, z, sum(case when z = 1 then 1 else 0 end) as count
from have
group by id
order by ord;
quit;

Ksharp
Super User
data have;
input ID X Y Z;
datalines;
1 23 3 1
2 12 2 0
3 10 3 0
1 80 6 1
2 09 2 0
1 78 7 1
4 14 6 0
5 67 9 1
6 12 5 1
1 89 4 0
;
proc sql;
create table want as
select *,
coalesce((select count(*) from have where z=1 and id=a.id),0) as count
 from have as a;
quit;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 1270 views
  • 2 likes
  • 4 in conversation