This solution is using Array. The subject is used as index of the array. This helps to decide in one probe whether the subject of AE is present in the PATDATA and if so TRT_CODE is harvested. This method of using array is called KEY INDEXing. Because it decides in one hit, this method is faster than HASH or other solutions. I wish you run all the methods for joy and compare run-times. To be frugal in using the memory, we size the array with say 263:275 from your PATDATA as one argument. We also need how many times a SUBJECT repeats. You know that 263 repeats 3 times but others are less. So the ecnomical use of array will be: array s[263:275, 3] _temporary_; You may say that 266 to 274 are not present in PATDATA. Array can't be declared in pieces and so we live with that limitation ! We call 263 as minsub, 275 as maxsub and 3 as maxGroup. For general purpose, these 3 values can be obtained by scanning the PATDATA data set as: data _null_;
retain max minsub maxsub;
do count = 1 by 1 until(last.subject);
set patdata end = eof;
by subject;
if last.subject then do;
max = max(max, count);
minsub = min(minsub, subject);
maxsub = max(maxsub, subject);
end;
end;
if eof then do;
call symputx('maxGroup', max);
call symputx('minsub', minsub);
call symputx('maxsub', maxsub);
end;
run;
%put &minsub;
%put &maxsub;
%put &maxGroup; The assumption is that the PATDATA, at least be GROUPED by SUBJECT and the AE data set need not be sorted by SUBJECT. The solution consists of two do-loops. When _N_ = 1, the first do-loop loads the TRT_CODE from PATDATA into the array. Remember that if a SUBJECT repeats, COUNT takes care of the correct cell of the array. data want;
array s[&minsub:&maxsub,&maxGroup] $1 _temporary_;
array c[&minsub:&maxsub] _temporary_;
if _n_ = 1 then do;
do until(eof);
do count = 1 by 1 until(last.subject);
set patdata end = eof;
by subject ;
s[subject, count] = trt_code;
end;
c[subject] = count;
end;
end; The second do-loop first ckecks whether the SUBJECT from AE falls between the RANGE(&minsub and &maxsub). If it doen't fall, the default missing value for TRT_CODE is written to output data set. If it falls then a check is made on the first cell of the SUBJECT whether it holds a non-missing TRT_CODE. Remember situation like missing SUBJECTS with 266 to 274, which by default, will hold missing TRT_CODE in the array. do until(last);
set ae end = last;
trt_code = ' ';
if ^(&minsub <= subject <= &maxsub) then output;
else if not missing(s[subject,1]) then
do i = 1 to c[subject];
trt_code = s[subject,i];
/** Your computations go here ..... **/
output;
end;
else output;
end;
drop i count;
run; Hope this is an acceptable solution to you. Don't forget feed us your comparison of solutions.
... View more