Hello SAS Community,
I'm working on a clinical trial dataset (qs00
) where I need to address a specific scenario involving deceased subjects. My objective is to generate new records for each missing visit up to a specified cycle number (&last_cycle_avisitn
) for subjects who have died (DTHFL = 'Y'
). Here's the situation:
AVISITN
) to the &last_cycle_avisitn
.AVAL
.qs00
contains visit information (AVISIT
, AVISITN
) along with other subject details.cycle_lookup
) with all the possible visit numbers and corresponding visit names.I attempted to use a hash object approach to achieve this. Here's a version of the code:
%let last_cycle_avisitn = 26; /* Example cycle number */
/* Step 1: Create a lookup table for all cycles */
proc sql noprint;
create table cycle_lookup as
select distinct AVISIT, AVISITN
from qs00;
quit;
/* Step 2: Identify the last AVISITN for each deceased subject */
proc sql;
create table last_visit_deceased as
select USUBJID, max(AVISITN) as LSTVIS
from qs00
where DTHFL = 'Y' ;
group by USUBJID;
quit;
/* Step 3: Create the hash objects and output the new dataset with imputed values */
data qs00_with_imputation;
/* Load the cycle_lookup into a hash object for quick reference */
if _n_ = 1 then do;
declare hash cycle_lookup(dataset: 'cycle_lookup');
cycle_lookup.defineKey('AVISITN');
cycle_lookup.defineData('AVISIT', 'AVISITN');
cycle_lookup.defineDone();
declare hash last_visit(dataset: 'last_visit_deceased');
last_visit.defineKey('USUBJID');
last_visit.defineData('LSTVIS');
last_visit.defineDone();
end;
/* Loop through the qs00 dataset */
do until (eof);
set qs00 end=eof;
output; /* Output existing data */
/* For deceased subjects, check if we need to impute missing cycles */
if DTHFL = 'Y' then do;
rc = last_visit.find(key: USUBJID);
/* If we found the subject in the last_visit hash table */
if rc = 0 and LSTVIS < &last_cycle_avisitn then do;
/* Create and output records for missing cycles */
do AVISITN = LSTVIS + 1 to &last_cycle_avisitn;
cycle_lookup.find(); /* Find the AVISIT matching the AVISITN */
AVAL = 0; /* Impute missing value */
output; /* Output the new imputed record */
end;
end;
end;
end;
run;
The challenge I'm facing is that the code either does not create the new records as intended or results in server disconnection due to potential memory overload. Additionally, there are concerns about whether the new records are only created for truly missing cycles.
I'm seeking advice on the following:
Any insights or suggestions from the community would be greatly appreciated!
/* Sample simulated dataset to represent the structure of the clinical trial data */ data qs00; input USUBJID $ AVISIT $ AVISITN DTHFL $ AVAL; datalines; 1001 Cycle1 1 Y 50 1001 Cycle2 2 Y 55 1001 Cycle3 3 Y 55 1002 Cycle1 1 Y 60 1002 Cycle2 2 Y 60 1003 Cycle1 1 65 1003 Cycle2 2 70 1003 Cycle3 3 75
1003 Cycle4 4 80
1003 Cycle5 5 85 ; run; /* Lookup table for all cycle names and numbers */ data cycle_lookup; input AVISIT $ AVISITN; datalines; Cycle1 1 Cycle2 2 Cycle3 3 Cycle4 4 Cycle5 5 ; run; /* Last visit information for deceased subjects */ data last_visit_deceased; input USUBJID $ last_avisitn; datalines; 1001 3 1002 2 ; run;
/* Sample simulated outcome dataset */
data want;
input USUBJID $ AVISIT $ AVISITN DTHFL $ AVAL;
datalines;
1001 Cycle1 1 Y 50
1001 Cycle2 2 Y 55
1001 Cycle3 3 Y 55
1001 Cycle4 4 Y 0 /* New record for deceased subject with imputed AVAL */
1001 Cycle5 5 Y 0 /* New record for deceased subject with imputed AVAL */
1002 Cycle1 1 Y 60
1002 Cycle2 2 Y 60
1002 Cycle3 3 Y 0 /* New record for deceased subject with imputed AVAL */
1002 Cycle4 4 Y 0 /* New record for deceased subject with imputed AVAL */
1002 Cycle5 5 Y 0 /* New record for deceased subject with imputed AVAL */
1003 Cycle1 1 65
1003 Cycle2 2 70
1003 Cycle3 3 75
1003 Cycle4 4 80
1003 Cycle5 5 85
;
run;
In the qs00
dataset:
USUBJID
is the subject ID.AVISIT
is the visit label.AVISITN
is the numeric visit number.DTHFL
indicates if the subject is deceased ('Y' for yes, ' ' for no).AVAL
is the value that will be imputed as 0 for missing records after death.The cycle_lookup
dataset contains a complete list of visits that are possible within the study up to the specified cycle number.
The last_visit_deceased
dataset contains the last visit information for subjects who are deceased. This is used to determine from which visit number we need to start imputing values.
Here is an option:
/* Sample simulated dataset to represent the structure of the clinical trial data */
data qs00;
infile cards dsd dlm="," missover;
input USUBJID :$ 4. AVISIT :$ 6. AVISITN DTHFL :$ 1. AVAL;
datalines;
1001,Cycle1,1,Y,50
1001,Cycle2,2,Y,55
1001,Cycle3,3,Y,55
1002,Cycle1,1,Y,60
1002,Cycle2,2,Y,60
1003,Cycle1,1,,65
1003,Cycle2,2,,70
1003,Cycle3,3,,75
1003,Cycle4,4,,80
1003,Cycle5,5,,85
;
run;
proc print data = qs00;
run;
/* Lookup table for all cycle names and numbers */
data cycle_lookup;
input AVISIT $ AVISITN;
datalines;
Cycle1 1
Cycle2 2
Cycle3 3
Cycle4 4
Cycle5 5
;
run;
proc print data = cycle_lookup;
run;
%let last_cycle_avisitn = 5;
data want;
if _N_=1 then
do;
declare hash C(dataset:"cycle_lookup");
C.defineKey("AVISITN");
C.defineData("AVISIT");
C.defineDone();
end;
array v[&last_cycle_avisitn.] _temporary_;
call missing(of v[*]);
do until(last.USUBJID);
set qs00;
by USUBJID;
v[AVISITN] = 1;
output;
end;
do AVISITN=1 to &last_cycle_avisitn.;
if v[AVISITN] = . then
do;
_N_ = C.find();
AVAL=0;
output;
end;
end;
run;
proc print data = want;
run;
But it looks like "hash-less" one, with just arrays:
%let last_cycle_avisitn = 5;
data want2;
/* array for mising visits */
array vn[&last_cycle_avisitn.] _temporary_;
/* array for visits names */
array vc[&last_cycle_avisitn.] $ 6 _temporary_;
if _N_=1 then do until(EOF1);
set cycle_lookup end=EOF1;
vc[AVISITN]=AVISIT;
put _all_;
end;
set qs00;
by USUBJID;
if first.USUBJID then call missing(of vn[*]);
vn[AVISITN] = 1;
output;
if last.USUBJID then
do AVISITN = 1 to &last_cycle_avisitn.;
if ^vn[AVISITN] then
do;
AVISIT=vc[AVISITN];
AVAL=0;
output;
end;
end;
run;
proc print data = want2;
run;
Bart
Here is an option:
/* Sample simulated dataset to represent the structure of the clinical trial data */
data qs00;
infile cards dsd dlm="," missover;
input USUBJID :$ 4. AVISIT :$ 6. AVISITN DTHFL :$ 1. AVAL;
datalines;
1001,Cycle1,1,Y,50
1001,Cycle2,2,Y,55
1001,Cycle3,3,Y,55
1002,Cycle1,1,Y,60
1002,Cycle2,2,Y,60
1003,Cycle1,1,,65
1003,Cycle2,2,,70
1003,Cycle3,3,,75
1003,Cycle4,4,,80
1003,Cycle5,5,,85
;
run;
proc print data = qs00;
run;
/* Lookup table for all cycle names and numbers */
data cycle_lookup;
input AVISIT $ AVISITN;
datalines;
Cycle1 1
Cycle2 2
Cycle3 3
Cycle4 4
Cycle5 5
;
run;
proc print data = cycle_lookup;
run;
%let last_cycle_avisitn = 5;
data want;
if _N_=1 then
do;
declare hash C(dataset:"cycle_lookup");
C.defineKey("AVISITN");
C.defineData("AVISIT");
C.defineDone();
end;
array v[&last_cycle_avisitn.] _temporary_;
call missing(of v[*]);
do until(last.USUBJID);
set qs00;
by USUBJID;
v[AVISITN] = 1;
output;
end;
do AVISITN=1 to &last_cycle_avisitn.;
if v[AVISITN] = . then
do;
_N_ = C.find();
AVAL=0;
output;
end;
end;
run;
proc print data = want;
run;
But it looks like "hash-less" one, with just arrays:
%let last_cycle_avisitn = 5;
data want2;
/* array for mising visits */
array vn[&last_cycle_avisitn.] _temporary_;
/* array for visits names */
array vc[&last_cycle_avisitn.] $ 6 _temporary_;
if _N_=1 then do until(EOF1);
set cycle_lookup end=EOF1;
vc[AVISITN]=AVISIT;
put _all_;
end;
set qs00;
by USUBJID;
if first.USUBJID then call missing(of vn[*]);
vn[AVISITN] = 1;
output;
if last.USUBJID then
do AVISITN = 1 to &last_cycle_avisitn.;
if ^vn[AVISITN] then
do;
AVISIT=vc[AVISITN];
AVAL=0;
output;
end;
end;
run;
proc print data = want2;
run;
Bart
Apparently, your data are sorted by AVISITN within USUBJID, and no USUBJID has any interior "holes" in AVISITN. (Only trailing holes for those who died prior to the last intended AVISITN).
If so, then you can have very simple code, in a single data step:
data qs00;
infile cards dsd dlm="," missover;
input USUBJID :$ 4. AVISIT :$ 6. AVISITN DTHFL :$ 1. AVAL;
datalines;
1001,Cycle1,1,Y,50
1001,Cycle2,2,Y,55
1001,Cycle3,3,Y,55
1002,Cycle1,1,Y,60
1002,Cycle2,2,Y,60
1003,Cycle1,1,,65
1003,Cycle2,2,,70
1003,Cycle3,3,,75
1003,Cycle4,4,,80
1003,Cycle5,5,,85
run ;
%let last_cycle_avisitn=5;
data want;
set qs00;;
by usubjid;
output;
if _n_=1 then do;
declare hash avisit_lookup (dataset:'qs00 (keep=avisitn avisit)');
avisit_lookup.definekey('avisitn');
avisit_lookup.definedata(all:'Y');
avisit_lookup.definedone();
end;
if last.usubjid and dthfl='Y' then do;
aval=0;
do while (avisitn<&last_cycle_avisitn); /* Output needed new records */
avisitn=avisitn+1;
avisit_lookup.find() ;
output;
end;
end;
run;
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.