Help using Base SAS procedures

A tricky one: Format an existing variable

Reply
Frequent Contributor
Posts: 131

A tricky one: Format an existing variable

I am trying to merge two datasets. Simply, data1 has variables of school, group, and sex1, and data2 has variables of school, group and sex2. Afer the following DATA step, the values for sex2 become 'O' if they are missing, not the 'Other' I want to get.

DATA work.data;
MERGE work.data1 work.data2;
BY school group;
LENGTH sex2 $5.;
FORMAT sex2 $5.;
IF MISSING(sex2) THEN sex2='Other';
RUN;
Respected Advisor
Posts: 4,173

Re: A tricky one: Format an existing variable

Hi

Have the variable declaration statements for sex2 before the merge.

DATA work.data;
LENGTH sex2 $5.;
FORMAT sex2 $5.;
MERGE work.data1 work.data2;
BY school group;
IF MISSING(sex2) THEN sex2='Other';
RUN;


Reason: Variable sex2 gets initiated based on the first occurrence in the code (which in your case is from the source data set where sex2 has a length of $1).

In your log there should also be a truncation warning.

HTH
Patrick
Frequent Contributor
Posts: 131

Re: A tricky one: Format an existing variable

Hi Patrick,

I tried your way, but the variable sex2 no longer exists in the new work.data.
PROC Star
Posts: 7,467

Re: A tricky one: Format an existing variable

Could it just have been moved to being the left most variable? If not, your data isn't similar to Patrick's example as his code does appear to work correctly.

Art
---------
> Hi Patrick,
>
> I tried your way, but the variable sex2 no longer
> exists in the new work.data.
Frequent Contributor
Frequent Contributor
Posts: 94

Re: A tricky one: Format an existing variable

SAS will pick up the first reference to a variable, and set the format and length from there. In your case that comes from the merge statement, hence the length of sex2 comes from the dataset work.data2.

If you move your length/format statements to before the merge, then your code should work (albeit the column order will have changed).

data data1;
format school $20.;
format group $20.;
format sex1 $1.;
infile datalines dsd;
input School $ Group $ Sex1 $;
datalines;
School1,Group1,M
School1,Group2,
School2,Group3,M
;
run;

data data2;
format school $20.;
format group $20.;
format sex2 $1.;
infile datalines dsd;
input School $ Group $ Sex2 $;
datalines;
School1,Group1,M
School1,Group2,M
School2,Group3,
;
run;

proc sort data=data1;
by school group;
run;

proc sort data=data2;
by school group;
run;

DATA work.data;
LENGTH sex2 $5.;
FORMAT sex2 $5.;
MERGE work.data1 work.data2;
BY school group;
IF MISSING(sex2) THEN sex2='Other';
RUN;
Ask a Question
Discussion stats
  • 4 replies
  • 128 views
  • 0 likes
  • 4 in conversation