Solved: Re: Generating Missing Records for Deceased Subjects Up to a Specified...

smackerz1988 · Posted 01-10-2024 02:01 PM

Hello SAS Community,

I'm working on a clinical trial dataset (qs00) where I need to address a specific scenario involving deceased subjects. My objective is to generate new records for each missing visit up to a specified cycle number (&last_cycle_avisitn) for subjects who have died (DTHFL = 'Y'). Here's the situation:

When a subject is marked as deceased, I need to create records for all subsequent visits from their last known visit (AVISITN) to the &last_cycle_avisitn.
Each new record should have an imputed value of 0 for the variable AVAL.
The dataset qs00 contains visit information (AVISIT, AVISITN) along with other subject details.
I have created a lookup table (cycle_lookup) with all the possible visit numbers and corresponding visit names.

I attempted to use a hash object approach to achieve this. Here's a version of the code:

%let last_cycle_avisitn = 26; /* Example cycle number */

/* Step 1: Create a lookup table for all cycles */
proc sql noprint;
    create table cycle_lookup as
    select distinct AVISIT, AVISITN
    from qs00;
quit;

/* Step 2: Identify the last AVISITN for each deceased subject */
proc sql;
    create table last_visit_deceased as
    select USUBJID, max(AVISITN) as LSTVIS
    from qs00
    where DTHFL = 'Y' ;
    group by USUBJID;
quit;

/* Step 3: Create the hash objects and output the new dataset with imputed values */
data qs00_with_imputation;
    /* Load the cycle_lookup into a hash object for quick reference */
    if _n_ = 1 then do;
        declare hash cycle_lookup(dataset: 'cycle_lookup');
        cycle_lookup.defineKey('AVISITN');
        cycle_lookup.defineData('AVISIT', 'AVISITN');
        cycle_lookup.defineDone();

        declare hash last_visit(dataset: 'last_visit_deceased');
        last_visit.defineKey('USUBJID');
        last_visit.defineData('LSTVIS');
        last_visit.defineDone();
    end;

    /* Loop through the qs00 dataset */
    do until (eof);
        set qs00 end=eof;
        output; /* Output existing data */  

        /* For deceased subjects, check if we need to impute missing cycles */
        if DTHFL = 'Y' then do;
            rc = last_visit.find(key: USUBJID);	

            /* If we found the subject in the last_visit hash table */
            if rc = 0 and LSTVIS < &last_cycle_avisitn then do;
                /* Create and output records for missing cycles */
                do AVISITN = LSTVIS + 1 to &last_cycle_avisitn;
                    cycle_lookup.find(); /* Find the AVISIT matching the AVISITN */
                    AVAL = 0; /* Impute missing value */
                    output; /* Output the new imputed record */
                end;
            end;
        end;
    end;
run;

The challenge I'm facing is that the code either does not create the new records as intended or results in server disconnection due to potential memory overload. Additionally, there are concerns about whether the new records are only created for truly missing cycles.

I'm seeking advice on the following:

How to ensure that the code only creates records for missing cycles without duplicating existing records.
How to handle memory issues potentially caused by hash object operations or the creation of new records.
Any alternative strategies or optimizations for the code that could improve performance and accuracy.

Any insights or suggestions from the community would be greatly appreciated!

/* Sample simulated dataset to represent the structure of the clinical trial data */
data qs00;
    input USUBJID $ AVISIT $ AVISITN DTHFL $ AVAL;
    datalines;
    1001 Cycle1 1 Y 50
    1001 Cycle2 2 Y 55
    1001 Cycle3 3 Y 55
    1002 Cycle1 1 Y 60
    1002 Cycle2 2 Y 60   
    1003 Cycle1 1  65
    1003 Cycle2 2  70
    1003 Cycle3 3  75
    1003 Cycle4 4  80 
    1003 Cycle5 5  85
    ;
run;

/* Lookup table for all cycle names and numbers */
data cycle_lookup;
    input AVISIT $ AVISITN;
    datalines;
    Cycle1 1
    Cycle2 2
    Cycle3 3
    Cycle4 4
    Cycle5 5
    ;
run;

/* Last visit information for deceased subjects */
data last_visit_deceased;
    input USUBJID $ last_avisitn;
    datalines;
    1001 3
    1002 2
    ;
run;

/* Sample simulated outcome dataset */
data want;
input USUBJID $ AVISIT $ AVISITN DTHFL $ AVAL;
datalines;
1001 Cycle1 1 Y 50
1001 Cycle2 2 Y 55
1001 Cycle3 3 Y 55 
1001 Cycle4 4 Y 0 /* New record for deceased subject with imputed AVAL */
1001 Cycle5 5 Y 0 /* New record for deceased subject with imputed AVAL */
1002 Cycle1 1 Y 60
1002 Cycle2 2 Y 60 
1002 Cycle3 3 Y 0 /* New record for deceased subject with imputed AVAL */
1002 Cycle4 4 Y 0 /* New record for deceased subject with imputed AVAL */
1002 Cycle5 5 Y 0 /* New record for deceased subject with imputed AVAL */
1003 Cycle1 1  65
1003 Cycle2 2  70
1003 Cycle3 3  75
1003 Cycle4 4  80 
1003 Cycle5 5  85 
;
run;

In the qs00 dataset:

USUBJID is the subject ID.
AVISIT is the visit label.
AVISITN is the numeric visit number.
DTHFL indicates if the subject is deceased ('Y' for yes, ' ' for no).
AVAL is the value that will be imputed as 0 for missing records after death.

The cycle_lookup dataset contains a complete list of visits that are possible within the study up to the specified cycle number.

The last_visit_deceased dataset contains the last visit information for subjects who are deceased. This is used to determine from which visit number we need to start imputing values.

yabwon · Posted 01-10-2024 05:47 PM

Here is an option:

/* Sample simulated dataset to represent the structure of the clinical trial data */
data qs00;
    infile cards dsd dlm="," missover;
    input USUBJID :$ 4. AVISIT :$ 6. AVISITN DTHFL :$ 1. AVAL;
datalines;
1001,Cycle1,1,Y,50
1001,Cycle2,2,Y,55
1001,Cycle3,3,Y,55
1002,Cycle1,1,Y,60
1002,Cycle2,2,Y,60
1003,Cycle1,1,,65
1003,Cycle2,2,,70
1003,Cycle3,3,,75
1003,Cycle4,4,,80
1003,Cycle5,5,,85
;
run;
proc print data = qs00;
run;

/* Lookup table for all cycle names and numbers */
data cycle_lookup;
    input AVISIT $ AVISITN;
datalines;
Cycle1 1
Cycle2 2
Cycle3 3
Cycle4 4
Cycle5 5
;
run;

proc print data = cycle_lookup;
run;


%let last_cycle_avisitn = 5;

data want;
  if _N_=1 then
    do;
      declare hash C(dataset:"cycle_lookup");
      C.defineKey("AVISITN");
      C.defineData("AVISIT");
      C.defineDone();
    end;

  array v[&last_cycle_avisitn.] _temporary_;
  call missing(of v[*]);

  do until(last.USUBJID);
    set qs00;
    by USUBJID;
    v[AVISITN] = 1;
    output;
  end;

  do AVISITN=1 to &last_cycle_avisitn.;
    if v[AVISITN] = . then 
      do;
        _N_ = C.find();
        AVAL=0;
        output;
      end;
  end;
run;

proc print data = want;
run;

But it looks like "hash-less" one, with just arrays:

%let last_cycle_avisitn = 5;

data want2;
 
  /* array for mising visits */
  array vn[&last_cycle_avisitn.] _temporary_;
  /* array for visits names */
  array vc[&last_cycle_avisitn.] $ 6 _temporary_;
  if _N_=1 then do until(EOF1);
    set cycle_lookup end=EOF1;
    vc[AVISITN]=AVISIT;
    put _all_;
  end;

  set qs00;
  by USUBJID;

  if first.USUBJID then call missing(of vn[*]);

  vn[AVISITN] = 1;
  output;
  
  if last.USUBJID then 
    do AVISITN = 1 to &last_cycle_avisitn.;
      if ^vn[AVISITN] then 
        do;
          AVISIT=vc[AVISITN];
          AVAL=0;
          output;
        end;
    end;
run;

proc print data = want2;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

View solution in original post

yabwon · Posted 01-10-2024 05:47 PM

Here is an option:

/* Sample simulated dataset to represent the structure of the clinical trial data */
data qs00;
    infile cards dsd dlm="," missover;
    input USUBJID :$ 4. AVISIT :$ 6. AVISITN DTHFL :$ 1. AVAL;
datalines;
1001,Cycle1,1,Y,50
1001,Cycle2,2,Y,55
1001,Cycle3,3,Y,55
1002,Cycle1,1,Y,60
1002,Cycle2,2,Y,60
1003,Cycle1,1,,65
1003,Cycle2,2,,70
1003,Cycle3,3,,75
1003,Cycle4,4,,80
1003,Cycle5,5,,85
;
run;
proc print data = qs00;
run;

/* Lookup table for all cycle names and numbers */
data cycle_lookup;
    input AVISIT $ AVISITN;
datalines;
Cycle1 1
Cycle2 2
Cycle3 3
Cycle4 4
Cycle5 5
;
run;

proc print data = cycle_lookup;
run;


%let last_cycle_avisitn = 5;

data want;
  if _N_=1 then
    do;
      declare hash C(dataset:"cycle_lookup");
      C.defineKey("AVISITN");
      C.defineData("AVISIT");
      C.defineDone();
    end;

  array v[&last_cycle_avisitn.] _temporary_;
  call missing(of v[*]);

  do until(last.USUBJID);
    set qs00;
    by USUBJID;
    v[AVISITN] = 1;
    output;
  end;

  do AVISITN=1 to &last_cycle_avisitn.;
    if v[AVISITN] = . then 
      do;
        _N_ = C.find();
        AVAL=0;
        output;
      end;
  end;
run;

proc print data = want;
run;

But it looks like "hash-less" one, with just arrays:

%let last_cycle_avisitn = 5;

data want2;
 
  /* array for mising visits */
  array vn[&last_cycle_avisitn.] _temporary_;
  /* array for visits names */
  array vc[&last_cycle_avisitn.] $ 6 _temporary_;
  if _N_=1 then do until(EOF1);
    set cycle_lookup end=EOF1;
    vc[AVISITN]=AVISIT;
    put _all_;
  end;

  set qs00;
  by USUBJID;

  if first.USUBJID then call missing(of vn[*]);

  vn[AVISITN] = 1;
  output;
  
  if last.USUBJID then 
    do AVISITN = 1 to &last_cycle_avisitn.;
      if ^vn[AVISITN] then 
        do;
          AVISIT=vc[AVISITN];
          AVAL=0;
          output;
        end;
    end;
run;

proc print data = want2;
run;

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

mkeintz · Posted 01-11-2024 12:36 AM

Apparently, your data are sorted by AVISITN within USUBJID, and no USUBJID has any interior "holes" in AVISITN. (Only trailing holes for those who died prior to the last intended AVISITN).

If so, then you can have very simple code, in a single data step:


data qs00;
    infile cards dsd dlm="," missover;
    input USUBJID :$ 4. AVISIT :$ 6. AVISITN DTHFL :$ 1. AVAL;
datalines;
1001,Cycle1,1,Y,50
1001,Cycle2,2,Y,55
1001,Cycle3,3,Y,55
1002,Cycle1,1,Y,60
1002,Cycle2,2,Y,60
1003,Cycle1,1,,65
1003,Cycle2,2,,70
1003,Cycle3,3,,75
1003,Cycle4,4,,80
1003,Cycle5,5,,85
run ;


%let last_cycle_avisitn=5;
data want;
  set qs00;;
  by usubjid;
  output;

  if _n_=1 then do;
    declare hash avisit_lookup (dataset:'qs00 (keep=avisitn avisit)');
      avisit_lookup.definekey('avisitn');
      avisit_lookup.definedata(all:'Y');
      avisit_lookup.definedone();
  end;

  if last.usubjid and dthfl='Y' then do;
    aval=0;
    do while (avisitn<&last_cycle_avisitn);  /* Output needed new records */
      avisitn=avisitn+1;
      avisit_lookup.find() ;
      output;
    end;
  end;
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

smackerz1988 · Posted 01-11-2024 06:34 AM

Thanks @yabwon and @mkeintz all these approaches work really well with the dataset I provided. However, I might have oversimplified my issue so I will need to do another post to explain better using a generic anonymised version of my actual data

smackerz1988 · Posted 01-11-2024 09:06 AM

@mkeintz @yabwon Please see updated post Generate Missing Records for Deceased Subjects Up to a Specified Cycle... - SAS Support Communities

Generating Missing Records for Deceased Subjects Up to a Specified Cycle using hash object

Re: Generating Missing Records for Deceased Subjects Up to a Specified Cycle using hash object

Re: Generating Missing Records for Deceased Subjects Up to a Specified Cycle using hash object

Re: Generating Missing Records for Deceased Subjects Up to a Specified Cycle using hash object

Re: Generating Missing Records for Deceased Subjects Up to a Specified Cycle using hash object

Re: Generating Missing Records for Deceased Subjects Up to a Specified Cycle using hash object

Generating Missing Records for Deceased Subjects Up to a Specified Cycle using hash object

Re: Generating Missing Records for Deceased Subjects Up to a Specified Cycle using hash object

Re: Generating Missing Records for Deceased Subjects Up to a Specified Cycle using hash object

Re: Generating Missing Records for Deceased Subjects Up to a Specified Cycle using hash object

Re: Generating Missing Records for Deceased Subjects Up to a Specified Cycle using hash object

Re: Generating Missing Records for Deceased Subjects Up to a Specified Cycle using hash object

Click image to register for webinar

Classroom Training Available!