Good afternoon fellow SAS users and the overall SAS community.
If anyone has some time to spare, I have a question concerning counting observations.
I have data in the form of person-weeks and I want to count blocks of continuous weeks without a gap.
Here is some sample data.
claimant | Week | Episode_new |
1 | 1 | 1 |
1 | 2 | 1 |
1 | 3 | 1 |
1 | 5 | 2 |
1 | 6 | 2 |
1 | 8 | 3 |
1 | 10 | 4 |
2 | 1 | 1 |
3 | 1 | 1 |
3 | 2 | 1 |
3 | 4 | 2 |
3 | 5 | 2 |
As a first attempt, I tried this, but it does not get me close.
data want;
set have;
by claimant;
if first.claimant then do; episode_new=1; end;
else if week = lag(week) then do; episode_new=1; end;
else if week > lag(week) then do; episode_new+1; end;
run;
Would anyone have any suggestions about how to tackle this question? Thanks, -Carmine
Do not conditionally execute the LAG() function. It will mess up how it works. Instead it is easier if you store the lagged value into a variable that you can then test.
data have;
input claimant Week Episode_new ;
cards;
1 1 1
1 2 1
1 3 1
1 5 2
1 6 2
1 8 3
1 10 4
2 1 1
3 1 1
3 2 1
3 4 2
3 5 2
;
data want ;
set have;
by claimant;
lastweek=lag(week);
if first.claimant then episode=1;
else if week > lastweek+1 then episode+1;
run;
proc print;
run;
Episode_ Obs claimant Week new lastweek episode 1 1 1 1 . 1 2 1 2 1 1 1 3 1 3 1 2 1 4 1 5 2 3 2 5 1 6 2 5 2 6 1 8 3 6 3 7 1 10 4 8 4 8 2 1 1 10 1 9 3 1 1 1 1 10 3 2 1 1 1 11 3 4 2 2 2 12 3 5 2 4 2
What would the "answer" look like for these 3 claimants? Claimant 1 has two sets of weeks that are back to back? So would Claimant 1 have a value of 2 for this variable you are trying to compute?
Do not conditionally execute the LAG() function. It will mess up how it works. Instead it is easier if you store the lagged value into a variable that you can then test.
data have;
input claimant Week Episode_new ;
cards;
1 1 1
1 2 1
1 3 1
1 5 2
1 6 2
1 8 3
1 10 4
2 1 1
3 1 1
3 2 1
3 4 2
3 5 2
;
data want ;
set have;
by claimant;
lastweek=lag(week);
if first.claimant then episode=1;
else if week > lastweek+1 then episode+1;
run;
proc print;
run;
Episode_ Obs claimant Week new lastweek episode 1 1 1 1 . 1 2 1 2 1 1 1 3 1 3 1 2 1 4 1 5 2 3 2 5 1 6 2 5 2 6 1 8 3 6 3 7 1 10 4 8 4 8 2 1 1 10 1 9 3 1 1 1 1 10 3 2 1 1 1 11 3 4 2 2 2 12 3 5 2 4 2
Thanks Tom. I will dissect your code in an effort to understand it better and add it to my SAS toolkit.
All the best!
-Carmine
@Tom :
"Do not conditionally execute the LAG() function. It will mess up how it works."
Could you please explain how calling the LAG function (or DIF function, for that matter) conditionally can "mess up how it works"?
If the condition is true, it is executed, and if it is false, it is not executed; that's all. From what I've seen, any problems with calling the queue functions conditionally arise from misunderstanding how they really work. In particular, it stems from the idea that LAGn is designed to "return the value from the nth previous observation"; and I've been amazed to encounter this bizarre concept somehow firmly planted even in the heads of truly stellar SAS programmers, especially since the SAS documentation clearly states that LAG "returns values from a queue". And indeed, LAGn is a simple static FIFO queue with n fixed items, returning the item from its front when an item is inserted into its rear, and its action has nothing to do with any "observations" whatsoever.
Of course, LAGn can be used to return the value from the nth previous observation, and perhaps the fact that it's used for this purpose most often has given rise to the idea that that is what the function is designed to do. However, it can be also used for other purposes whenever utilizing a fixed queue as a data structure suits one's needs. For example, LAG (a 1-item queue) can be used in lieu of a temporary variable.
Also, methinks the name of the function itself, LAG, is chosen awry and adds to the misunderstanding. If it were called, for example, just using the word describing what it really does, i.e. QUEUE, it could make one pause and think of it in a less deviated direction.
Kind regards
Paul D.
The way that LAG (and DIF) are implemented is by making a "stack" of the values when the function executes. So if you skip executing the function on one observation then the value from that observation is never put onto the stack and so that value can never be recalled.
Of course; but it's only dictated by the nature of this particular task. In different situations, one may want to enqueue/dequeue LAG only when certain conditions are met. In any case, the function itself operates the same way regardless of whether it's part of a conditional clause or not.
Kind regards
Paul D.
p.s. LAG is not implemented as a stack (LIFO) but as a queue (FIFO). Because its size (the number of items) is fixed, the enque and dequeue operations are performed simultaneously, i.e. when a new item is inserted in the rear of the queue, the front item is ejected. The hash object can be used to implement a dynamic queue where (a) the enqueue and dequeue operations are independent and (b) the current number of items is the difference between the number of times the two operations, respectively, have been performed.
Your idea of using a queue is sound, only since you need to look at the difference between the current week value and the one from the prior record, the queue function DIF is a more suitable tool than LAG:
data have ;
input claimant Week ;
cards ;
1 1
1 2
1 3
1 5
1 6
1 8
1 10
2 1
3 1
3 2
3 4
3 5
run ;
data want ;
set have ;
by claimant ;
if first.claimant then episode_new = 1 ;
else if dif (week) > 1 then episode_new + 1 ;
run ;
On the other hand, you are saying that you want is to "count blocks of continuous weeks without a gap" for each claimant. Of course, the last record in each BY group in the file WANT gives you what you need. However, if you are only interested in the pure count rather than creating the variable EPISODE_NEW, you can either add the statement:
if last.claimant ;
to the code above or do it differently from the outset:
data want (keep = claimant count) ;
do until (last.claimant) ;
set have ;
by claimant ;
length _s $ 32767 ;
substr (_s, week, 1) = "x" ;
end ;
Count = countw (_s) ;
run ;
Kind regards
Paul D.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.