The above link addresses the question, "How can I create an enumeration variable by groups?" The last paragraph states that it is not difficult to create an enumeration variable by groups with multiple layers, but I am having problems doing so.
My data is organized by an ID variable, a time variable (year), and then an intra-time variable (quarter). But the data contains gaps. For example, it may be like the following:
ID | Year | Quarter |
---|---|---|
1 | 2000 | 1 |
1 | 2000 | 2 |
1 | 2004 | 1 |
I want to count how many consecutive quarterly observations I have. I tried following the sample code in the link above:
data two1;
set two;
count + 1;
by id year quarter;
if first.id or first.year or first.quarter then count = 1;
run;
But that did not do it for me. I think a problem I have is that there could be a consecutive string from the fourth quarter of year t - 1 to the first quarter of year t. My output was wrong regardless.
Can you show us what you want/expect count to look like?
Art
ID | Year | Quarter | Count |
---|---|---|---|
1 | 2000 | 1 | 1 |
1 | 2000 | 2 | 2 |
1 | 2000 | 3 | 3 |
1 | 2000 | 4 | 4 |
1 | 2001 | 1 | 5 |
1 | 2004 | 2 | 1 |
2 | 1999 | 4 | 1 |
2 | 2000 | 1 | 2 |
Art, that's what I have in "mind." I want to count how many consecutive quarterly observations I have per ID. My data is organized by ID, year, and quarter, but there may be gaps.
I don't have access to SAS at the moment, thus can only write pseudocode, that is totally untested and probably wrong code. That said, my general approach would be to create a pseudo date, and then use the intck function with a lag to see whether to increment the desired counter. E.g.
data two1;
set two;
by id;
pseudodate=mdy(quarter*3,1,year);
lastdate=lag(pseudodate);
if first.id then count = 1;
else do;
if intck('qtr',lastdate,pseudodate) eq 1 then count+1;
else count=1;
end;
run;
Hopefully, that will give you enough direction to actually solve your problem.
Art
How about:
data temp; infile datalines expandtabs ; input ID Year Quarter ; datalines; 1 2000 1 1 2000 2 1 2000 3 1 2000 4 1 2001 1 1 2004 2 2 1999 4 2 2000 1 ; run; proc sql noprint; create table all as select * from (select distinct id from temp), (select distinct year from temp), (select distinct quarter from temp) ; quit; proc sort data=temp ; by id year quarter; run; proc sort data=all; by id year quarter; run; data op; merge all temp(in=in_temp); by id year quarter; if in_temp then flag=1; run; data want(where=(flag is not missing)); set op; if missing(flag) or id ne lag(id) then count=0; if not missing(flag) then count+1; run;
Ksharp
Message was edited by: xia keshan
Hi Yanagi,
You can use Lag function to get the last period in the curret row, and with by statments to get the the new column count for your needs.
data temp;
input ID Year Quarter ;
date=mdy(quarter*3,01,year);
format date ddmmyy10.;
cards;
1 2000 1
1 2000 2
1 2000 3
1 2000 4
1 2001 1
1 2004 2
2 1999 4
2 2000 1
;
run;
proc sort data=temp;
by id date;
run;
data temp2;
set temp end=eof;
by id date;
date_prev=lag(date);
if first.id then do;
count=1;
date_prev=date;
end;
else do;
if mdy(month(intnx('quarter',date_prev,1,'end')),01,year(intnx('quarter',date_prev,1,'end')))=date then count+1;
else count=1;
end;
format date_prev ddmmyy10.;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.