BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Carmine_Rossi
Calcite | Level 5

Good afternoon fellow SAS users and the overall SAS community.

 

If anyone has some time to spare, I have a question concerning counting observations.

 

I have data in the form of person-weeks and I want to count blocks of continuous weeks without a gap.

 

Here is some sample data. 

claimant

Week

Episode_new

1

1

1

1

2

1

1

3

1

1

5

2

1

6

2

1

8

3

1

10

4

2

1

1

3

1

1

3

2

1

3

4

2

3

5

2

 

As a first attempt, I tried this, but it does not get me close.

 

data want;

  set have;

  by claimant;

  if first.claimant then do; episode_new=1; end;

  else if week = lag(week) then do; episode_new=1; end;

  else if week > lag(week) then do; episode_new+1; end;

   run;

 

Would anyone have any suggestions about how to tackle this question? Thanks, -Carmine

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

Do not conditionally execute the LAG() function. It will mess up how it works.  Instead it is easier if you store the lagged value into a variable that you can then test. 

data have;
  input claimant Week Episode_new ;
cards;
1 1 1
1 2 1
1 3 1
1 5 2
1 6 2
1 8 3
1 10 4
2 1 1
3 1 1
3 2 1
3 4 2
3 5 2
;

data want ;
  set have;
  by claimant;
  lastweek=lag(week);
  if first.claimant then episode=1;
  else if week > lastweek+1 then episode+1;
run;

proc print;
run;
                           Episode_
Obs    claimant    Week       new      lastweek    episode

  1        1         1         1           .          1
  2        1         2         1           1          1
  3        1         3         1           2          1
  4        1         5         2           3          2
  5        1         6         2           5          2
  6        1         8         3           6          3
  7        1        10         4           8          4
  8        2         1         1          10          1
  9        3         1         1           1          1
 10        3         2         1           1          1
 11        3         4         2           2          2
 12        3         5         2           4          2

 

View solution in original post

7 REPLIES 7
svh
Lapis Lazuli | Level 10 svh
Lapis Lazuli | Level 10

What would the "answer" look like for these 3 claimants? Claimant 1 has two sets of weeks that are back to back? So would Claimant 1 have a value of 2 for this variable you are trying to compute?

Tom
Super User Tom
Super User

Do not conditionally execute the LAG() function. It will mess up how it works.  Instead it is easier if you store the lagged value into a variable that you can then test. 

data have;
  input claimant Week Episode_new ;
cards;
1 1 1
1 2 1
1 3 1
1 5 2
1 6 2
1 8 3
1 10 4
2 1 1
3 1 1
3 2 1
3 4 2
3 5 2
;

data want ;
  set have;
  by claimant;
  lastweek=lag(week);
  if first.claimant then episode=1;
  else if week > lastweek+1 then episode+1;
run;

proc print;
run;
                           Episode_
Obs    claimant    Week       new      lastweek    episode

  1        1         1         1           .          1
  2        1         2         1           1          1
  3        1         3         1           2          1
  4        1         5         2           3          2
  5        1         6         2           5          2
  6        1         8         3           6          3
  7        1        10         4           8          4
  8        2         1         1          10          1
  9        3         1         1           1          1
 10        3         2         1           1          1
 11        3         4         2           2          2
 12        3         5         2           4          2

 

Carmine_Rossi
Calcite | Level 5

Thanks Tom. I will dissect your code in an effort to understand it better and add it to my SAS toolkit.

All the best!

-Carmine

hashman
Ammonite | Level 13

@Tom :

"Do not conditionally execute the LAG() function. It will mess up how it works."

 

Could you please explain how calling the LAG function (or DIF function, for that matter) conditionally can "mess up how it works"?

 

If the condition is true, it is executed, and if it is false, it is not executed; that's all. From what I've seen, any problems with calling the queue functions conditionally arise from misunderstanding how they really work. In particular, it stems from the idea that LAGn is designed to "return the value from the nth previous observation"; and I've been amazed to encounter this bizarre concept somehow firmly planted even in the heads of truly stellar SAS programmers, especially since the SAS documentation clearly states that LAG "returns values from a queue". And indeed, LAGn is a simple static FIFO queue with n fixed items, returning the item from its front when an item is inserted into its rear, and its action has nothing to do with any "observations" whatsoever.   

 

Of course, LAGn can be used to return the value from the nth previous observation, and perhaps the fact that it's used for this purpose most often has given rise to the idea that that is what the function is designed to do. However, it can be also used for other purposes whenever utilizing a fixed queue as a data structure suits one's needs. For example, LAG (a 1-item queue) can be used in lieu of a temporary variable.

 

Also, methinks the name of the function itself, LAG, is chosen awry and adds to the misunderstanding. If it were called, for example, just using the word describing what it really does, i.e. QUEUE, it could make one pause and think of it in a less deviated direction.

 

Kind regards

Paul D.     

   

Tom
Super User Tom
Super User

The way that LAG (and DIF) are implemented is by making a "stack" of the values when the function executes.  So if you skip executing the function on one observation then the value from that observation is never put onto the stack and so that value can never be recalled.

hashman
Ammonite | Level 13

Of course; but it's only dictated by the nature of this particular task. In different situations, one may want to enqueue/dequeue LAG only when certain conditions are met. In any case, the function itself operates the same way regardless of whether it's part of a conditional clause or not. 

 

Kind regards

Paul D.

 

p.s. LAG is not implemented as a stack (LIFO) but as a queue (FIFO). Because its size (the number of items) is fixed, the enque and dequeue operations are performed simultaneously, i.e. when a new item is inserted in the rear of the queue, the front item is ejected. The hash object can be used to implement a dynamic queue where (a) the enqueue and dequeue operations are independent and (b) the current number of items is the difference between the number of times the two operations, respectively, have been performed. 

hashman
Ammonite | Level 13

@Carmine_Rossi :

Your idea of using a queue is sound, only since you need to look at the difference between the current week value and the one from the prior record, the queue function DIF is a more suitable tool than LAG:

data have ;                                    
  input claimant Week ;                        
  cards ;                                      
1  1                                           
1  2                                           
1  3                                           
1  5                                           
1  6                                           
1  8                                           
1 10                                           
2  1                                           
3  1                                           
3  2                                           
3  4                                           
3  5                                           
run ;                                          
                                               
data want ;                                    
  set have ;                                   
  by claimant ;                                
  if first.claimant      then episode_new = 1 ;
  else if dif (week) > 1 then episode_new + 1 ;
run ;                                          

On the other hand, you are saying that  you want is to "count blocks of continuous weeks without a gap" for each claimant. Of course, the last record in each BY group in the file WANT gives you what you need. However, if you are only interested in the pure count rather than creating the variable EPISODE_NEW, you can either add the statement:

 

if last.claimant ;

 

to the code above or do it differently from the outset:

data want (keep = claimant count) ;
  do until (last.claimant) ;       
    set have ;                     
    by claimant ;                  
    length _s $ 32767 ;            
    substr (_s, week, 1) = "x" ;   
  end ;                            
  Count = countw (_s) ;            
run ;                              

Kind regards

Paul D.

 

 

 

  

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1028 views
  • 0 likes
  • 4 in conversation