The basic problem is that two different operations (pre-processing of data2 variables and appending data2 to data1) are crammed together in a data step. IMHO, both the manual pdv refreshing and the conditional processing relying on the in= ds option variable seem contrived. (besides, the in= ds option for data1 is not used at all)
The pre-processing of the data2 should and can be separated without any performance penalty using a view. Things like this are what views are for.
[pre]
data data1;
infile datalines firstobs=2;
input @1 name $15. @17 dob yymmdd8. @26 age;
datalines;
----+----1----+----2----+-
John Smith 19851225 25
Jack Bauer 19600704 50
Charlie Day 19791021 31
;
run;
data data2;
infile datalines firstobs=2;
input @1 name $15. @17 person_dob @26 person_age $2.;
datalines;
----+----1----+----2----+-
Patrick Stewart 19500406 60
Steve Jobs 19520115 58
Bill Gates 19510803 59
;
run;
data view2/view=view2;
set data2;
dob = input(strip(person_dob), yymmdd8.);
format dob yymmdd8.;
age = input(person_age, best.);
keep name dob age;
run;
data both;
set data1 view2;
run;
/* check */
proc print data=both;
run;
/*
Obs name dob age
1 John Smith 85-12-25 25
2 Jack Bauer 60-07-04 50
3 Charlie Day 79-10-21 31
4 Patrick Stewart 50-04-06 60
5 Steve Jobs 52-01-15 58
6 Bill Gates 51-08-03 59
*/
[/pre]
... View more