If you make an empty dataset with the desired attributes (i.e. length), then you can use it to guide the length of appending.
data dummy_data;
length last_name $30 first_name $20;
stop;
run;
data mydata;
set dummy_data data1 data2;
run;
In the "data mydata" step, the length of a variable will be determined by its first reference - in this case dummy_data (with zero observations but also with metadata for each variable), which had lengths assigned in the prior step. The advantage of this is that you can make dummy_data a permanent data set, but can subsequently use it to process any update files, (i.e. run the second step only) with no further specific length assignments. The more variables you have to deal with, the more beneficial this technique will be.
This behavior means that
data mydata;
set data1 data2;
run;
will assign lengths determined by DATA1, as you have discovered.
Of course, you could switch the order of the data set names in the SET statement (set data2 data1). But if you really want all the data1 cases preceding all the data2 cases, you could also
data mydata;
if 0 then set data2;
set data1 data2;
run;
which forces the compiler to encounter the lengths in data2 first even though the data in data2 follows data1. And of course there is a risk that data2 doesn't always have the longer lengths.
A long-winded way to suggest the dummy_data approach for any work that will be repeated.
... View more