@ImSpartacus wrote: Thanks for the tips. Concerning the source datasets, they have dozens of fields and they do vary slightly. However, the handful of fields that I actually need are mercifully consistent across all files except for one small exception: SSN. On some files, SSN is stored with dashes (XXX-XX-XXXX) and on some, it's stored without dashes (XXXXXXXXX). I assumed that might be a simple enough thing to overcome, but it turned out to be slightly more complex than I expected.
If the only difference then a read program would read SSN as character using the longer version. To make all the data similar it would be easy to either remove the dashes (compress function) or substring and concatenate results with dashes (catx).
data junk;
input ssn $ 1-11;
if length(ssn)=11 then nodash=compress(ssn,'-');
if length(ssn)=9 then adddash=catx('-',substr(ssn,1,3),substr(ssn,4,2),substr(ssn,6));
datalines;
123-45-6789
123456789
;
run;
... View more