BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
smackerz1988
Pyrite | Level 9

I have quite a difficult specification which I could really use some guidance on as if I figure out the programming logic I can apply to similar variables

 

For each subject and scheduled visit, plus the visit where ABLFL equals "Y" and PARAMCD equals "EASITOTA" if this was unscheduled, create one record with PARAMCD set to "TOTEDPS1". For each subject and visit, merge with the intermediate dataset ADAM.ADINT and apply the estimand policies as follows: If there is a record where ADINT.PARAMCD equals "RTT1L28D" or "T2SWREL" and ADINT.AVAL equals 1, then impute AVAL as the value of AVAL at the most recent record (including unscheduled visits) where AVAL is non-missing, PARAMCD equals "EASITOTA" and the corresponding values of ADINT.AVAL where ADINT.PARAMCD equals "RTT1L28D" or "T2SWREL" are both equal to 0. Else, if there is a record where ADINT.PARAMCD equals "SWNREL" and ADINT.AVAL equals 1, then set AVAL to missing. Else, set AVAL to the value of AVAL from the record where PARAMCD equals "EASITOTA" sharing the same USUBJID and AVISITN.

 

Here is a small sample of the data in data step and image form for context as I have transposed and merged the relevant ADINT.PARAMCDs mentioned in the spec so  I can cross reference with AVAL/VISIT records in creating the record (or having the groundwork to facilitate the creation at least) 

smackerz1988_1-1664894164058.png

 

data have;
input PARAMCD $ USUBJID $ VISIT $ AVAL :8. ABLFL :$ AVISITN :8. RTT1L28D :8. SWNREL :8. T2SWREL :8.;
infile datalines dlm = '|';
datalines;
EASITOTA|1001|Screening|16| |10|1|0|0
EASITOTA|1001|Day 1    |13.4| |20|0|0|0
EASITOTA|1001|Week 2   |8.1| |30|0|0|0
EASITOTA|1001|Week 4   |5.6| |40|0|0|0
EASITOTA|1001|Week 8   |6| |50|0|0|0
EASITOTA|1001|Week 12  |8.7| |60|0|0|0
EASITOTA|1002|Screening|14| |10|0|0|0
EASITOTA|1002|Day 1    |16.4|Y|20|0|0|0
EASITOTA|1002|Week 2   |13.8| |30|0|0|0
EASITOTA|1002|Week 4   |21.4| |40|1|0|0
EASITOTA|1002|Week 8   |14| |50|1|0|0
EASITOTA|1002|Week 12  |15.6| |60|1|0|0
EASITOTA|1002|Week 16  |21.4|	|70|1|0|0		
EASITOTA|1004|Screening|9.6| |10|0|0|0
EASITOTA|1004|Week 2   |8.6| |30|0|0|0
EASITOTA|1004|Week 4   |8.7| |40|0|0|0
EASITOTA|1005|Screening|11.6| |10|1|0|0
EASITOTA|1005|Day 1    |19.8|Y|20|1|0|0	
EASITOTA|1006|Screening|17.1| |10|0|0|0
EASITOTA|1006|Day 1    |15.8|Y|20|0|0|0
EASITOTA|1006|Unscheduled|6| |100|1|0|0		
EASITOTA|1007|Screening|33.3| |10|0|0|1
EASITOTA|1007|Day 1    |40.8|Y|20|0|0|1
EASITOTA|1007|Week 2   |44| |30|0|0|1
EASITOTA|1007|Week 4   |48| |40|0|0|1
EASITOTA|1007|Unscheduled|57.8| |100|0|0|1
;
run;

Here is an outline of the desired output. So it would be one created record per subject per visit with PARAMCD ="TOTEDPS1" except Unscheduled visits but the AVAL from the Unscheduled visit would still be relevant and be imputed forward and considered the prior/most recent record 

smackerz1988_0-1664901249194.png

Hopefully this makes sense. I know a lag function is probably the most suitable here but just need some extra help with the rest 

 

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

I'm still not sure I'm understanding.  But here's a bit of a stab at it, hopefully the approach will help.  I would usually approach LOCF by using RETAIN, rather than lag, something like:

 

data have;
input PARAMCD $ USUBJID $ VISIT $ AVAL :8. ABLFL :$ AVISITN :8. RTT1L28D :8. SWNREL :8. T2SWREL :8.;
infile datalines dlm = '|';
datalines;
EASITOTA|1001|Screening|16| |10|0|0|0
EASITOTA|1001|Day 1    |13.4| |20|0|0|0
EASITOTA|1001|Week 2   |8.1| |30|0|0|0
EASITOTA|1001|Week 4   |5.6| |40|0|0|0
EASITOTA|1001|Week 8   |6| |50|0|0|0
EASITOTA|1001|Week 12  |8.7| |60|0|0|0
EASITOTA|1002|Screening|14| |10|0|0|0
EASITOTA|1002|Day 1    |16.4|Y|20|0|0|0
EASITOTA|1002|Week 2   |13.8| |30|0|0|0
EASITOTA|1002|Week 4   |21.4| |40|1|0|0
EASITOTA|1002|Week 8   |14| |50|1|0|0
EASITOTA|1002|Week 12  |15.6| |60|1|0|0
EASITOTA|1002|Week 16  |21.4|	|70|1|0|0		
EASITOTA|1004|Screening|9.6| |10|0|0|0
EASITOTA|1004|Week 2   |8.6| |30|0|0|0
EASITOTA|1004|Week 4   |8.7| |40|0|0|0
EASITOTA|1005|Screening|11.6| |10|1|0|0
EASITOTA|1005|Day 1    |19.8|Y|20|1|0|0	
EASITOTA|1006|Screening|17.1| |10|0|0|0
EASITOTA|1006|Day 1    |15.8|Y|20|0|0|0
EASITOTA|1006|Unscheduled|6| |100|1|0|0		
EASITOTA|1007|Screening|33.3| |10|0|0|0
EASITOTA|1007|Day 1    |40.8|Y|20|0|0|1
EASITOTA|1007|Week 2   |44| |30|0|0|1
EASITOTA|1007|Week 4   |48| |40|0|0|1
EASITOTA|1007|Unscheduled|57.8| |100|0|0|1
;
run;

data want ;
  set have ;
  by USUBJID ;
  retain _RetainedAval ;
  if first.USUBJID then _RetainedAval=. ;
  if max(RTT1L28D,SWNREL,T2SWREL)=0 then _RetainedAval=Aval  ;
  output ;
  PARAMCD='TOTEDPS1' ;
  Aval=_RetainedAval ;
  if SWNREL=1 then Aval=. ;
  if VISIT ne 'Unschedu' then output ;
  *drop _RetainedAval ;
run ;

proc print data=want ;
run ;

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

View solution in original post

7 REPLIES 7
ballardw
Super User

This part of your requirement:

 the value of AVAL at the most recent record (including unscheduled visits) where AVAL is non-missing

requires some explanation as to the definition of "most recent". Since you have values that do not have any natural order in your Visit variable such as "screening", "unscheduled" and "early termination" in your first screen shot you need to provide an explicit description of exactly how to tell what qualifies as "most recent" and is the "most recent" supposed to be prior?

 

Also your request includes references to USUBJID and AVISITN, that are not defined or provided in the data.

Hint: don't reference values for things you haven't provided a variable in the example data, or use the same names for your variables in the data. If we have to make guesses as to which variable is meant then results are problematic.

 

It is often a good idea to provide the expected result for your given example data as sometimes instructions with lots of "else" are not quite written in a clear enough manner to actually determine order of programming operations. In the expected result indicate which is the "create one record" actual created record.

smackerz1988
Pyrite | Level 9

"Also your request includes references to USUBJID and AVISITN, that are not defined or provided in the data.

Hint: don't reference values for things you haven't provided a variable in the example data, or use the same names for your variables in the data. If we have to make guesses as to which variable is meant then results are problematic."

 

AVISITN is provided in the data and is pre-sorted but yes unscheduled visits by nature are not in a natural order and yeah SUBJID is an oversight on my part as SUBJID/USUBJID is interchangeable but yes confusing without that prior knowledge. Most recent is meant as the prior result it's just how the specification is stated which is out of my control.

 

"It is often a good idea to provide the expected result for your given example data as sometimes instructions with lots of "else" are not quite written in a clear enough manner to actually determine order of programming operations. In the expected result indicate which is the "create one record" actual created record."

 

I have updated the desired output to specify what the desired created record is ideally and to clarify the "unscheduled" and "early termination" issue both  would be considered terms for unscheduled events. Hope this helps I appreciate your time explaining this. I will say that I do not have a crystallised understanding of the requirements here fully due to the complexity of the spec but trying to provide as much detail as I can.

smackerz1988
Pyrite | Level 9

@bballard So after reviewing the variable names  and how they relate to each other I think I basically need to retain the value of AVAL of EASITOTA for the AVAL for TOTEDPS1 where RTT1L28D, SWNREL and T2SWREL are all 0. and if there is a value of SWREL equal to 1 then AVAL for TOTEDPS1 is set to missing. if a value of 1 occurs in the first visit e.g screening then aval is the same value.

 

I need to follow up about updating the specs as there is issues but If I could get the bare bones of how to retain the values when all 3 are 0 then that would be all that  I need.  

 

smackerz1988_0-1664913315629.png

 

smackerz1988
Pyrite | Level 9

Can't edit my original post but this is the desired output;

 

data want;
input PARAMCD $ USUBJID $ VISIT $ AVAL :8. ABLFL :$ AVISITN :8. RTT1L28D :8. SWNREL :8. T2SWREL :8.;
infile datalines dlm = '|';
datalines;
EASITOTA|1001|Screening|16| |10|0|0|0
TOTEDPS1|1001|Screening|16| |10|0|0|0
EASITOTA|1001|Day 1    |13.4| |20|0|0|0
TOTEDPS1|1001|Day 1    |13.4| |20|0|0|0
EASITOTA|1001|Week 2   |8.1| |30|0|0|0
TOTEDPS1|1001|Week 2   |8.1| |30|0|0|0
EASITOTA|1001|Week 4   |5.6| |40|0|0|0
TOTEDPS1|1001|Week 4   |5.6| |40|0|0|0
EASITOTA|1001|Week 8   |6| |50|0|0|0
TOTEDPS1|1001|Week 8   |6| |50|0|0|0
EASITOTA|1001|Week 12  |8.7| |60|0|0|0
TOTEDPS1|1001|Week 12  |8.7| |60|0|0|0
EASITOTA|1002|Screening|14| |10|0|0|0
TOTEDPS1|1002|Screening|14| |10|0|0|0
EASITOTA|1002|Day 1    |16.4|Y|20|0|0|0
TOTEDPS1|1002|Day 1    |16.4|Y|20|0|0|0
EASITOTA|1002|Week 2   |13.8| |30|0|0|0
TOTEDPS1|1002|Week 2   |13.8| |30|0|0|0
EASITOTA|1002|Week 4   |21.4| |40|1|0|0
TOTEDPS1|1002|Week 4   |13.8| |40|1|0|0
EASITOTA|1002|Week 8   |14| |50|1|0|0
TOTEDPS1|1002|Week 8   |13.8| |50|1|0|0
EASITOTA|1002|Week 12  |15.6| |60|1|0|0
TOTEDPS1|1002|Week 12  |13.8| |60|1|0|0
EASITOTA|1002|Week 16  |21.4|	|70|1|0|0
TOTEDPS1|1002|Week 16  |13.8|	|70|1|0|0		
EASITOTA|1004|Screening|9.6| |10|0|0|0
TOTEDPS1|1004|Screening|9.6| |10|0|0|0
EASITOTA|1004|Week 2   |8.6| |30|0|0|0
TOTEDPS1|1004|Week 2   |8.6| |30|0|0|0
EASITOTA|1004|Week 4   |8.7| |40|0|0|0
TOTEDPS1|1004|Week 4   |8.7| |40|0|0|0
EASITOTA|1005|Screening|11.6| |10|0|0|0
TOTEDPS1|1005|Screening|11.6| |10|0|0|0
EASITOTA|1005|Day 1    |19.8|Y|20|1|0|0	
TOTEDPS1|1005|Day 1    |11.6|Y|20|1|0|0	
EASITOTA|1006|Screening|17.1| |10|0|0|0
TOTEDPS1|1006|Screening|17.1| |10|0|0|0
EASITOTA|1006|Day 1    |15.8|Y|20|0|0|0
TOTEDPS1|1006|Day 1    |15.8|Y|20|0|0|0
EASITOTA|1006|Unscheduled|6| |100|1|0|0		
EASITOTA|1007|Screening|33.3| |10|0|0|0
TOTEDPS1|1007|Screening|33.3| |10|0|0|0
EASITOTA|1007|Day 1    |40.8|Y|20|0|0|1
TOTEDPS1|1007|Day 1    |33.3|Y|20|0|0|1
EASITOTA|1007|Week 2   |44| |30|0|0|1
TOTEDPS1|1007|Week 2   |33.3| |30|0|0|1
EASITOTA|1007|Week 4   |48| |40|0|0|1
TOTEDPS1|1007|Week 4   |33.3| |40|0|0|1
EASITOTA|1007|Unscheduled|57.8| |100|0|0|1
;
run;

 

 

 

 

Quentin
Super User

I'm having a hard time understanding this problem.  I think partly it's because the variable names don't have meaning to me.  I wonder, would it be possible to make a smaller example of this problem?  With perhaps fewer variables, with  names that explain what they are?  

 

Big picture, what is the goal?  Looks like your making new rows for TOTEDPS1, which I assume some type of "total", but in your WANT data the total row is often the same as the EASITOTA row.  

 

Since you understand the context, I would try making a smaller example, with variable names that are not context-specific, that would allow you to describe the logic clearly.  That should make it easier for people to give advice on programming approaches.  And then you can always take that approach and apply it to your real data.

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.
smackerz1988
Pyrite | Level 9

Hello,

 

Okay let me try sub setting to two subjects. So it's basically a conditional Last observation carried forward approach. USUBJID 1001 because all the events (RTT1L28,SWNREL,T2SWREL) did not occur they all have results of  0 so the result (AVAL) for that subject at that visit is retained for the AVAL for the new created record (TOTEDPS1). USUBJID 1002 follows the same pattern until week 4 where an event has occurred for RTT1L28 indicated by a value of 1. So we have to retain the most recent visit where all the events (RTT1L28,SWNREL,T2SWREL) = 0 so in this case Week 2 and that week 2 AVAL (13.8) will be used for every other visits (Week 4, 8,12,16)  because of that RTT1L28 value of 1 being present for all those visits (I've realised Week 16 in the data previously provided should also be 13.8- apologies). If a value of SWNREL had a value of 1 at a particular visit then the TOTEDPS1 record would be created but would be set to missing. If an unscheduled visit occurred then the TOTEDPS1 record would not be created but is still considered as a value to be carried forward. For example, if the AVAL (say 4) had all the events (RTT1L28,SWNREL,T2SWREL) = 0  and then subsequent visit (for example Week 4) had all  RTT1L28,T2SWREL = 1 then the Unscheduled AVAL of 4 would be imputed/carried forward as the most recent value. Hopefully that makes sense. Here is a rough attempt at coding 

data TOTEDPS1 (where= (paramcd= 'EASITOTA') keep= subjid ablfl paramcd aval_ visit visitnum aval rtt1l28d t2swrel swnrel avisitn lag_value);
   length paramcd $ 20 ;
   set adeasint_m;
	by usubjid avisitn;
	array columns {3} rtt1l28d t2swrel swnrel;
    
	 
	lag_value=lag(aval);
 
    do i = 1 to dim(columns);
    if abs(columns {i}) >0 then DO;	
	    paramcd= 'TOTEDPS1';
	 	 aval_=lag_value;
	 end;
	 else do;
	         paramcd= 'TOTEDPS1'; 
            aval_=aval;
         end;
  end;
  
run;

 

 

data want;
input PARAMCD $ USUBJID $ VISIT $ AVAL :8. ABLFL :$ AVISITN :8. RTT1L28D :8. SWNREL :8. T2SWREL :8.;
infile datalines dlm = '|';
datalines;
EASITOTA|1001|Screening|16| |10|0|0|0
TOTEDPS1|1001|Screening|16| |10|0|0|0
EASITOTA|1001|Day 1    |13.4| |20|0|0|0
TOTEDPS1|1001|Day 1    |13.4| |20|0|0|0
EASITOTA|1001|Week 2   |8.1| |30|0|0|0
TOTEDPS1|1001|Week 2   |8.1| |30|0|0|0
EASITOTA|1001|Week 4   |5.6| |40|0|0|0
TOTEDPS1|1001|Week 4   |5.6| |40|0|0|0
EASITOTA|1001|Week 8   |6| |50|0|0|0
TOTEDPS1|1001|Week 8   |6| |50|0|0|0
EASITOTA|1001|Week 12  |8.7| |60|0|0|0
TOTEDPS1|1001|Week 12  |8.7| |60|0|0|0
EASITOTA|1002|Screening|14| |10|0|0|0
TOTEDPS1|1002|Screening|14| |10|0|0|0
EASITOTA|1002|Day 1    |16.4|Y|20|0|0|0
TOTEDPS1|1002|Day 1    |16.4|Y|20|0|0|0
EASITOTA|1002|Week 2   |13.8| |30|0|0|0
TOTEDPS1|1002|Week 2   |13.8| |30|0|0|0
EASITOTA|1002|Week 4   |21.4| |40|1|0|0
TOTEDPS1|1002|Week 4   |13.8| |40|1|0|0
EASITOTA|1002|Week 8   |14| |50|1|0|0
TOTEDPS1|1002|Week 8   |13.8| |50|1|0|0
EASITOTA|1002|Week 12  |15.6| |60|1|0|0
TOTEDPS1|1002|Week 12  |13.8| |60|1|0|0
EASITOTA|1002|Week 16  |21.4|	|70|1|0|0
TOTEDPS1|1002|Week 16  |13.8|	|70|1|0|0

 

Quentin
Super User

I'm still not sure I'm understanding.  But here's a bit of a stab at it, hopefully the approach will help.  I would usually approach LOCF by using RETAIN, rather than lag, something like:

 

data have;
input PARAMCD $ USUBJID $ VISIT $ AVAL :8. ABLFL :$ AVISITN :8. RTT1L28D :8. SWNREL :8. T2SWREL :8.;
infile datalines dlm = '|';
datalines;
EASITOTA|1001|Screening|16| |10|0|0|0
EASITOTA|1001|Day 1    |13.4| |20|0|0|0
EASITOTA|1001|Week 2   |8.1| |30|0|0|0
EASITOTA|1001|Week 4   |5.6| |40|0|0|0
EASITOTA|1001|Week 8   |6| |50|0|0|0
EASITOTA|1001|Week 12  |8.7| |60|0|0|0
EASITOTA|1002|Screening|14| |10|0|0|0
EASITOTA|1002|Day 1    |16.4|Y|20|0|0|0
EASITOTA|1002|Week 2   |13.8| |30|0|0|0
EASITOTA|1002|Week 4   |21.4| |40|1|0|0
EASITOTA|1002|Week 8   |14| |50|1|0|0
EASITOTA|1002|Week 12  |15.6| |60|1|0|0
EASITOTA|1002|Week 16  |21.4|	|70|1|0|0		
EASITOTA|1004|Screening|9.6| |10|0|0|0
EASITOTA|1004|Week 2   |8.6| |30|0|0|0
EASITOTA|1004|Week 4   |8.7| |40|0|0|0
EASITOTA|1005|Screening|11.6| |10|1|0|0
EASITOTA|1005|Day 1    |19.8|Y|20|1|0|0	
EASITOTA|1006|Screening|17.1| |10|0|0|0
EASITOTA|1006|Day 1    |15.8|Y|20|0|0|0
EASITOTA|1006|Unscheduled|6| |100|1|0|0		
EASITOTA|1007|Screening|33.3| |10|0|0|0
EASITOTA|1007|Day 1    |40.8|Y|20|0|0|1
EASITOTA|1007|Week 2   |44| |30|0|0|1
EASITOTA|1007|Week 4   |48| |40|0|0|1
EASITOTA|1007|Unscheduled|57.8| |100|0|0|1
;
run;

data want ;
  set have ;
  by USUBJID ;
  retain _RetainedAval ;
  if first.USUBJID then _RetainedAval=. ;
  if max(RTT1L28D,SWNREL,T2SWREL)=0 then _RetainedAval=Aval  ;
  output ;
  PARAMCD='TOTEDPS1' ;
  Aval=_RetainedAval ;
  if SWNREL=1 then Aval=. ;
  if VISIT ne 'Unschedu' then output ;
  *drop _RetainedAval ;
run ;

proc print data=want ;
run ;

 

BASUG is hosting free webinars Next up: Don Henderson presenting on using hash functions (not hash tables!) to segment data on June 12. Register now at the Boston Area SAS Users Group event page: https://www.basug.org/events.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 893 views
  • 1 like
  • 3 in conversation