BookmarkSubscribeRSS Feed
dwhitney
Calcite | Level 5

I have a dataset in a long format where each subsequent row per ID is a new dental surgery event; folks have anywhere from 2-5 distinct surgery dates (tx_date) within 30 days of the first tx_date. The outcome is examined up to 30 days after the first tx_date. For those that developed the outcome (postop_yn=1), the subsequent dental surgeries occur before or on the same day as the outcome. So, all unique tx_dates per ID are within 30 days of the very first tx_date through the end of the 30 day follow-up or on/before developing the outcome.

 

At each surgery, a certain number of teeth get extracted (Num_teeth_per_txdate). I need help in figuring out how to code the dataset so that I get a start (day 1) to end (the day before the subsequent tx_date) per tx_date while cumulatively summing the Num_teeth_per_txdate until the last tx_date. 

 

1. One question is how to code the start/stop so that the stop is 1 day before the subsequent tx_date; right now, I am getting the stop to equal the next tx_date.

 

2. The other question is how to get the Num_teeth_per_txdate to be cumulatively summing over the tx_dates.

 

Below is the first 50 observations of the dataset. After that is the code I am struggling with. 

 

idtx_datetx_visitdays_diffNum_teeth_per_txdatepostop_ynpostop_days
12058710/3/2022102030
12058710/28/20222251030
1492683/25/2021105030
1492684/15/20212213030
1520671/28/2021103030
1520672/11/20212141030
15419812/10/2020105030
15419812/17/2020272030
1541981/6/20213273030
1701359/26/2022101030
17013510/20/20222241030
1817363/12/2024101030
1817363/19/2024271030
3098686/16/2023103030
3098687/14/20232282030
3181514/4/2023106130
3181514/18/20232149130
32371911/9/2022101030
32371911/15/2022268030
32793611/23/2020101030
32793612/22/20202291030
32905312/23/2021103030
3290531/11/20222191030
3363343/5/2024101030
3363343/27/20242223030
3377985/5/2021106030
3377985/14/2021294030
33822711/29/2023103030
33822712/19/20232204030
3409831/20/2021104030
3409831/28/2021284030
3409832/3/20213142030
34553910/3/2025103030
34553910/17/20252143030
34553910/31/20253283030
3484577/6/2023101121
3484577/17/20232113121
3491187/21/2020101030
3491187/30/2020294030
3491188/13/20203232030
34994810/12/2022101116
34994810/19/2022274116
3503821/24/2024101030
3503822/7/20242144030
3548062/23/2021102030
3548063/2/2021272030
36331011/12/2020104030
36331011/17/2020256030
3678205/20/2022105030
3678206/14/20222252030

 

Code that needs some help.

 

data v2a; set v2; 
by id; 
retain start stop tvteeth teeth; 
if first.id then do; start=1; stop=1; teeth=.; tvteeth=Num_teeth_per_txdate; end;
if tx_visit ne 1 and days_diff < postop_days then do; start=stop; stop=days_diff; teeth=lag(Num_teeth_per_txdate); tvteeth=tvteeth+teeth; event=0; output; end;
if last.id then do; start=stop; stop=postop_days; teeth=lag(Num_teeth_per_txdate); tvteeth=tvteeth+teeth; event=(postop_yn=1); output; end;
run;

 

12 REPLIES 12
PaigeMiller
Diamond | Level 26

@dwhitney wrote:

 

1. One question is how to code the start/stop so that the stop is 1 day before the subsequent tx_date; right now, I am getting the stop to equal the next tx_date.


I am mystified by this question. I feel either I have completely missed the point, or you have missed something obvious. Can you not just subtract 1 and get the quantity you want?

 

2. The other question is how to get the Num_teeth_per_txdate to be cumulatively summing over the tx_dates.

Cumulative number of teeth, you can add this into your existing SAS data step code:

 

if first.id then cum_teeth=0;
cum_teeth+num_teeth_per_txdate;

 

--
Paige Miller
dwhitney
Calcite | Level 5

Hi Paige,

 

Thank you for the reply!

 

1. I can easily subtract 1 from STOP, but in a follow up DATA STEP as I can isolate the ID order number. In the current DATA STEP, if I subtract or add 1 to either the START or STOP for lines "if tx_visit ne 1..." and "if last.id...", it does this calculation on rows that I DO and do NOT want it to. For example,  the below code (bolded segments are the revision) yields the correct START/STOP for the first and last ID rows, but not middle rows for those with 3+ tx_dates (see last ID in below table; I bolded the incorrect row).

 

data v2a; set v2; 
by id; 
retain start stop tvteeth teeth; 
if first.id then cum_teeth=0; cum_teeth+num_teeth_per_txdate;
if first.id then do; start=1; stop=1; teeth=.; tvteeth=Num_teeth_per_txdate; end;
if tx_visit ne 1 and days_diff < postop_days then do; start=stop; stop=days_diff-1; teeth=lag(Num_teeth_per_txdate); tvteeth=tvteeth+teeth; event=0; output; end;
if last.id then do; start=stop+1; stop=postop_days; teeth=lag(Num_teeth_per_txdate); tvteeth=tvteeth+teeth; event=(postop_yn=1); output; end;
run;
 
idtx_datedays_diffstartstop
12058710/3/20220124
12058710/28/2022252530
1492683/25/20210120
1492684/15/2021212130
1520671/28/20210113
1520672/11/2021141430
15419812/10/2020016
15419812/17/20207626
1541981/6/2021272730

 

 

 

2. I have added your suggested code in several few different places, but it is not getting the cumulative sum, and I am not sure why. Below is the first 9 observations where the cum_teeth variable is showing what it SHOULD be, but I can't get that. Any suggestions? Below is the last version of the code I tried.

 

idtx_datestartstopcum_teeth
12058710/28/20221242
12058710/28/202225303
1492684/15/20211205
1492684/15/202121308
1520672/11/20211133
1520672/11/202114304
15419812/17/2020165
1541981/6/20216267
1541981/6/2021273010

 

data v2a; set v2; 
by id; 
retain start stop teeth cum_teeth; 
if first.id then do; start=1; stop=1; cum_teeth=0; end;
if tx_visit ne 1 and days_diff < postop_days then do; start=stop; stop=days_diff-1; cum_teeth+num_teeth_per_txdate; event=0; output; end;
if last.id then do; start=stop+1; stop=postop_days; cum_teeth+num_teeth_per_txdate; event=(postop_yn=1); output; end;
run;
PaigeMiller
Diamond | Level 26

Regarding #2, the summing to get cum_teeth should not be inside an IF statement. That line must execute on every record. 

 

Regarding #1, I am going to request that you provide code inside a code box (click on the little running man icon and paste your code in there) with one command per line, and with proper indentation.

 

And in fact, it is also requested (by me, if you want me to run code on your data) to provide the data as working SAS data step code (examples and instructions).

--
Paige Miller
dwhitney
Calcite | Level 5
data v2a;  set v2; 

by id; 

retain start stop cum_teeth; 

if first.id then do; start=1; stop=1; cum_teeth=0; end;

if tx_visit ne 1 and days_diff < postop_days then do; start=stop; stop=days_diff-1; event=0; output; end;

if last.id then do; start=stop+1; stop=postop_days; event=(postop_yn=1); output; end;

cum_teeth+num_teeth_per_txdate;
			
run;

Above is the code that is still not working to get the cumulative teeth.

 

Below is the working SAS data step code. I am only taking the first ~50 observations.

 

data 50.V2;
infile datalines dsd truncover;
input id:BEST. tx_date:MMDDYY10. tx_visit:32. days_diff:32. Num_teeth_per_txdate:32. postop_yn:32. postop_days:32.;
format id BEST. tx_date MMDDYY10.;
label id="ID" tx_date="Tx_date";
datalines4;
120587,10/03/2022,1,0,2,0,30
120587,10/28/2022,2,25,1,0,30
149268,03/25/2021,1,0,5,0,30
149268,04/15/2021,2,21,3,0,30
152067,01/28/2021,1,0,3,0,30
152067,02/11/2021,2,14,1,0,30
154198,12/10/2020,1,0,5,0,30
154198,12/17/2020,2,7,2,0,30
154198,01/06/2021,3,27,3,0,30
170135,09/26/2022,1,0,1,0,30
170135,10/20/2022,2,24,1,0,30
181736,03/12/2024,1,0,1,0,30
181736,03/19/2024,2,7,1,0,30
309868,06/16/2023,1,0,3,0,30
309868,07/14/2023,2,28,2,0,30
318151,04/04/2023,1,0,6,1,30
318151,04/18/2023,2,14,9,1,30
323719,11/09/2022,1,0,1,0,30
323719,11/15/2022,2,6,8,0,30
327936,11/23/2020,1,0,1,0,30
327936,12/22/2020,2,29,1,0,30
329053,12/23/2021,1,0,3,0,30
329053,01/11/2022,2,19,1,0,30
336334,03/05/2024,1,0,1,0,30
336334,03/27/2024,2,22,3,0,30
337798,05/05/2021,1,0,6,0,30
337798,05/14/2021,2,9,4,0,30
338227,11/29/2023,1,0,3,0,30
338227,12/19/2023,2,20,4,0,30
340983,01/20/2021,1,0,4,0,30
340983,01/28/2021,2,8,4,0,30
340983,02/03/2021,3,14,2,0,30
345539,10/03/2025,1,0,3,0,30
345539,10/17/2025,2,14,3,0,30
345539,10/31/2025,3,28,3,0,30
348457,07/06/2023,1,0,1,1,21
348457,07/17/2023,2,11,3,1,21
349118,07/21/2020,1,0,1,0,30
349118,07/30/2020,2,9,4,0,30
349118,08/13/2020,3,23,2,0,30
349948,10/12/2022,1,0,1,1,16
349948,10/19/2022,2,7,4,1,16
350382,01/24/2024,1,0,1,0,30
350382,02/07/2024,2,14,4,0,30
354806,02/23/2021,1,0,2,0,30
354806,03/02/2021,2,7,2,0,30
363310,11/12/2020,1,0,4,0,30
363310,11/17/2020,2,5,6,0,30
367820,05/20/2022,1,0,5,0,30
367820,06/14/2022,2,25,2,0,30
369007,12/01/2025,1,0,5,0,30
369007,12/18/2025,2,17,5,0,30

;;;;

PaigeMiller
Diamond | Level 26

Please comply with my request, with emphasis in red.

 

Regarding #1, I am going to request that you provide code inside a code box (click on the little running man icon and paste your code in there) with one command per line, and with proper indentation.

 

 

--
Paige Miller
dwhitney
Calcite | Level 5

Apologies, I must not know what proper indentation is. I googled it and tried the below. 

 

data v2a;  
    set v2; 
        by id; 
            retain start stop cum_teeth; 
            if first.id then do; 
                start=1; stop=1; cum_teeth=0; 
            end;
            if tx_visit ne 1 and days_diff < postop_days then do; 
                start=stop; stop=days_diff-1; event=0; 
           output; 
           end;
           if last.id then do; 
               start=stop+1; stop=postop_days; event=(postop_yn=1); 
           output; 
           end;
        cum_teeth+num_teeth_per_txdate;		
run;
PaigeMiller
Diamond | Level 26

You still have three commands on a single line.

 

Nevertheless, this is much more workable, so thank you.  Here is my solution.

 

data v2a;  
    set v2; 
    by id; 
    prev_tx_date=lag(tx_date);
    if first.id then do; 
        days_diff=0;
        cum_teeth=0; 
    end;
    else do;
        days_diff=tx_date-prev_tx_date-1;
    end;
    cum_teeth+num_teeth_per_txdate;	
    drop prev_tx_date;
run;

 

--
Paige Miller
dwhitney
Calcite | Level 5

Ah, I see now what is meant by a command! I will do that in the future. Thank you for working me on this. 

 

Your code works well to get the cum_teeth, but not the START and STOP (in days), which I need for the time-varying proc phreg model. I tried using your code and adding START/STOP variables, but to no luck.

 

If first.ID, then START should equal 1 and the STOP should be 1 day minus the next tx_date per ID. Then for subsequent tx_dates per ID, the START should equal the diff_days (tx_date of that row - tx_date from the row before) and the STOP should equal either: (1)  1 day minus the next tx_date per ID if not the last.ID or (2) the postop_days if the last.ID. Is there a way to build that into your code?

Tom
Super User Tom
Super User

Not sure I follow your whole thread but is seems you might need to know the NEXT value of START to do what you want.   That is harder than finding the PREVIOUS value since it is much harder to predict the future than to remember the past.

 

SAS makes it easy to find the PREVIOUS value by using the LAG() function to remember the past value.

 

Here is simple trick to predict the future.  Use an additional SET statement to read in the same dataset only using FIRSTOBS=2 to skip the first observation.  I like to add a third read of just the first observation to make the two SET statements have the same number of observations.

Remember to take care not to use the NEXT value when at the end of a BY group since it comes from the following BY group.  Just like you took care not to use the PREVIOUS value at the start of the BY group since it came from an earlier BY group.

data v2a;  
    set v2; 
    by id tx_date ; 
    prev_tx_date=lag(tx_date);
    set v2(keep=tx_date rename=(tx_date=next_tx_date) firstobs=2)
        v2(obs=1 drop=_all_)
    ;
    if first.id then do; 
        days_diff=0;
        cum_teeth=0; 
    end;
    else do;
        days_diff=tx_date-prev_tx_date-1;
    end;
    if last.id then call missing(next_tx_date);
    else stop_date = next_tx_date -1 ;
    format stop_date date9.;
    cum_teeth+num_teeth_per_txdate;	
    drop prev_tx_date next_tx_date ;
run;
PaigeMiller
Diamond | Level 26

Maybe someone else can figure out how to do this in one step, but here is my solution which takes several steps

 

data v2a;  
    set v2; 
    by id; 
    prev_tx_date=lag(tx_date);
    if first.id then do; 
        days_diff=0;
        cum_teeth=0; 
        start=1;
    end;
    else do;
        days_diff=tx_date-prev_tx_date-1;
    end;
    cum_teeth+num_teeth_per_txdate;	
    drop prev_tx_date;
run;

/* Find max tx_date and min tx_date per id */
proc summary nway data=v2a;
    class id;
    var tx_date;
    output out=max_tx_date min=min_tx_date max=max_tx_date;
run;
data v2b;
    merge v2a max_tx_date(keep=max_tx_date min_tx_date id);
    by id;
run;
data v2c;
    merge v2b v2b(firstobs=2 keep=tx_date rename=(tx_date=next_tx_date));
    if days_diff=0 then do; 
        start=1;
        stop=next_tx_date-tx_date-1; 
    end;
    else do;
        if tx_date^=max_tx_date then do;
            start=tx_date-min_tx_date;
            stop=next_tx_date-min_tx_date-1; 
        end;
        else do;
            start=tx_date-min_tx_date;
            stop=postop_days;
        end;
    end;
    drop min_tx_date max_tx_date next_tx_date;
run;
--
Paige Miller
quickbluefish
Barite | Level 11

You might try something like this.  That said, I don't think I really follow the purpose of the 'post_op' variable here.

proc sql noprint;
select max(nrecs) into :maxrecs trimmed 
from 
	(select id, count(*) as nrecs from tx group by id);
quit;

%let maxrecs=%eval(&maxrecs+1);

proc sort data=tx; by id tx_date; run;

data tx2;
set tx;
by id;
array dts {&maxrecs} _temporary_;
array tth {&maxrecs} _temporary_;
length n startdt enddt 4 nteeth 3;
format startdt enddt date9.;
if first.id then call missing(of dts[*], of tth[*], n, nteeth);
n+1;
dts[n]=tx_date;
tth[n]=num_teeth_per_txdate;
if last.id then do;
	do i=1 to n;
		startdt=dts[i];
		enddt=dts[i+1]-1;
		nteeth+tth[i];
		output;
	end;
end;
keep id startdt enddt nteeth;
run;
dwhitney
Calcite | Level 5

Thank you, kindly, to everyone! Using a bit of everyone's approach, I finally got the dataset that I needed. This has been an incredibly helpful experience. Thank you, everyone!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 410 views
  • 0 likes
  • 4 in conversation