DATA Step, Macro, Functions and more

running count by group

Accepted Solution Solved
Reply
Regular Contributor
Posts: 196
Accepted Solution

running count by group

I sorted by id and date (mm/dd/yy).

I need to create a running count by id and job (while keeping the dates in order).

How do I create the variable count?

id date job count

1 1/1/10 x 1

1 1/15/10 y 2

1 2/2/10 y 2

1 4/1/10 x 3

1 5/1/10 x 3

1 4/2/11 z 4

2 4/3/09 x 1

2 4/6/09 m 2

2 5/8/09 m 2

2 6/8/10 m 2

2 7/8/10 y 3

2 7/8/11 m 4

2 8/8/11 m 4


Accepted Solutions
Solution
‎06-28-2011 02:56 AM
Super User
Posts: 9,681

Re: running count by group

If that obs were 4,then it is easy.

data temp;
input id date : mmddyy10. job  $;
format date mmddyy10.;
cards;
1 1/1/10 x 
1 1/15/10 y 
1 2/2/10 y 
1 4/1/10 x 
1 5/1/10 x 
1 4/2/11 z 
2 4/3/09 x 
2 4/6/09 m 
2 5/8/09 m 
2 6/8/10 m 
2 7/8/10 y 
2 7/8/11 m 
;
run; 
data want;
 set temp;
 if id ne lag(id) then count=0;
 if job ne lag(job) then count+1;
run;

Ksharp

Message was edited by: xia keshan

View solution in original post


All Replies
Super User
Posts: 9,681

running count by group

id date job count

1 1/1/10 x 1

1 1/15/10 y 2

1 2/2/10 y 2

1 4/1/10 x 3

1 5/1/10 x 3

1 4/2/11 z 1

2 4/3/09 x 1

2 4/6/09 m 2

2 5/8/09 m 2

2 6/8/10 m 2

2 7/8/10 y 3

2 7/8/11 m 4

2 8/8/11 m 4

Why this obs 's count is 1 not 4? There are some other logic?

Ksharp

Solution
‎06-28-2011 02:56 AM
Super User
Posts: 9,681

Re: running count by group

If that obs were 4,then it is easy.

data temp;
input id date : mmddyy10. job  $;
format date mmddyy10.;
cards;
1 1/1/10 x 
1 1/15/10 y 
1 2/2/10 y 
1 4/1/10 x 
1 5/1/10 x 
1 4/2/11 z 
2 4/3/09 x 
2 4/6/09 m 
2 5/8/09 m 
2 6/8/10 m 
2 7/8/10 y 
2 7/8/11 m 
;
run; 
data want;
 set temp;
 if id ne lag(id) then count=0;
 if job ne lag(job) then count+1;
run;

Ksharp

Message was edited by: xia keshan

Valued Guide
Posts: 765

Re: running count by group

Hi ... try some FIRST. variables ...

* leave C1 in the data set to see if logic worked;

data test;

input id date : mmddyy. job : $1. c1;

format date mmddyy10.;

datalines;

1 1/1/10 x 1

1 1/15/10 y 2

1 2/2/10 y 2

1 4/1/10 x 3

1 5/1/10 x 3

1 4/2/11 z 4

2 4/3/09 x 1

2 4/6/09 m 2

2 5/8/09 m 2

2 6/8/10 m 2

2 7/8/10 y 3

2 7/8/11 m 4

2 8/8/11 m 4

;

run;

* create C2 (should be same as C1);

data test;

set test;

by id job notsorted;

c2+(-c2*first.id);

c2+first.job;

run;

* hey, it is !!! ;

proc print data=test;

run;

id         date    job    c1    c2

1    01/01/2010     x      1     1

1    01/15/2010     y      2     2

1    02/02/2010     y      2     2

1    04/01/2010     x      3     3

1    05/01/2010     x      3     3

1    04/02/2011     z      4     4

2    04/03/2009     x      1     1

2    04/06/2009     m      2     2

2    05/08/2009     m      2     2

2    06/08/2010     m      2     2

2    07/08/2010     y      3     3

2    07/08/2011     m      4     4

2    08/08/2011     m      4     4

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 1872 views
  • 3 likes
  • 3 in conversation