SAS will pick up the first reference to a variable, and set the format and length from there. In your case that comes from the merge statement, hence the length of sex2 comes from the dataset work.data2.
If you move your length/format statements to before the merge, then your code should work (albeit the column order will have changed).
data data1;
format school $20.;
format group $20.;
format sex1 $1.;
infile datalines dsd;
input School $ Group $ Sex1 $;
datalines;
School1,Group1,M
School1,Group2,
School2,Group3,M
;
run;
data data2;
format school $20.;
format group $20.;
format sex2 $1.;
infile datalines dsd;
input School $ Group $ Sex2 $;
datalines;
School1,Group1,M
School1,Group2,M
School2,Group3,
;
run;
proc sort data=data1;
by school group;
run;
proc sort data=data2;
by school group;
run;
DATA work.data;
LENGTH sex2 $5.;
FORMAT sex2 $5.;
MERGE work.data1 work.data2;
BY school group;
IF MISSING(sex2) THEN sex2='Other';
RUN;
... View more