DATA Step, Macro, Functions and more

what's wrong with my match merge:

Reply
Occasional Contributor
Posts: 17

what's wrong with my match merge:

Dear all,

I have two datasets, in dataset one, I have some states with state id(stateid) but vacant state name(statename). In dataset two, I have only two variables: state id(stataid), state name(statename), and all stateid and statename are not vacant, I sort both dataset by stateid and merge the two, my code is as follows:

             data three;

                   merge one two;

                        by stateid;

             run;

but  I still have the same vacant state names!

What's wrong with my code?

 

Thanks a lot!

 

 

Trusted Advisor
Posts: 1,394

Re: what's wrong with my match merge:

When you have same variable (state name) on both data sets, only one takes place in output.

You can either drop the vacant sate name or use update instead merge, where the second data set

updates variables from the first one.

Super User
Posts: 6,948

Re: what's wrong with my match merge:

[ Edited ]

Let's dig a little into the workings of the data step.

data have1;
infile cards truncover;
input state_id state_name $;
cards;
1 
1 
2 
2 
3 
;
run;

data have2;
infile cards truncover;
input state_id state_name $;
cards;
1 NY
2 CA
2 CA
3 FL
;
run;

data want;
merge have1 have2;
by state_id;
run;

When the third data step is compiled, SAS looks at the structure of the contributing datasets, and creates a PDV (program data vector) with the variables state_id and state_name.

When the data step executes its first iteration, data is read from have1, resulting in an empty state_name; next, data from have2 is read (as state_id matches), and state_name is set to "NY". Data is output.

Then, in the second iteration, a matching record from have1 is read, once again state_name is set to empty; since no further matching record is present in have2, no data is read from there, and therefore the empty state_name is written to the ouput.

With state_id=2 I deliberately inserted a second record in have2, and since a second matching read from have2 can be performed, the value "CA" is set for both empty records in have1.

A proper merge would look like that:

data want;
merge
  have1
  have2 (rename=(state_name=new_state_name))
;
by state_id;
if state_name = ' ' then state_name = new_state_name;
drop new_state_name;
run;

 

Edit: took care of improper code formatting and typos

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Posts: 5,085

Re: what's wrong with my match merge:

The values for STATE_NAME get read only once, from each source of data.  So on the first observation per StateID, the STATE_NAME always comes from TWO, as the values from TWO overwrite the values from ONE.  But after the first observation per StateID, the values from ONE are read in and no longer get overwritten by values from TWO.  So the simple solution would be:

 

data three;

merge one (drop=state_name) two;

by stateID;

run;

Ask a Question
Discussion stats
  • 3 replies
  • 216 views
  • 0 likes
  • 4 in conversation