BookmarkSubscribeRSS Feed
owenwqp1
Obsidian | Level 7

Dear all,

I have two datasets, in dataset one, I have some states with state id(stateid) but vacant state name(statename). In dataset two, I have only two variables: state id(stataid), state name(statename), and all stateid and statename are not vacant, I sort both dataset by stateid and merge the two, my code is as follows:

             data three;

                   merge one two;

                        by stateid;

             run;

but  I still have the same vacant state names!

What's wrong with my code?

 

Thanks a lot!

 

 

3 REPLIES 3
Shmuel
Garnet | Level 18

When you have same variable (state name) on both data sets, only one takes place in output.

You can either drop the vacant sate name or use update instead merge, where the second data set

updates variables from the first one.

Kurt_Bremser
Super User

Let's dig a little into the workings of the data step.

data have1;
infile cards truncover;
input state_id state_name $;
cards;
1 
1 
2 
2 
3 
;
run;

data have2;
infile cards truncover;
input state_id state_name $;
cards;
1 NY
2 CA
2 CA
3 FL
;
run;

data want;
merge have1 have2;
by state_id;
run;

When the third data step is compiled, SAS looks at the structure of the contributing datasets, and creates a PDV (program data vector) with the variables state_id and state_name.

When the data step executes its first iteration, data is read from have1, resulting in an empty state_name; next, data from have2 is read (as state_id matches), and state_name is set to "NY". Data is output.

Then, in the second iteration, a matching record from have1 is read, once again state_name is set to empty; since no further matching record is present in have2, no data is read from there, and therefore the empty state_name is written to the ouput.

With state_id=2 I deliberately inserted a second record in have2, and since a second matching read from have2 can be performed, the value "CA" is set for both empty records in have1.

A proper merge would look like that:

data want;
merge
  have1
  have2 (rename=(state_name=new_state_name))
;
by state_id;
if state_name = ' ' then state_name = new_state_name;
drop new_state_name;
run;

 

Edit: took care of improper code formatting and typos

Astounding
PROC Star

The values for STATE_NAME get read only once, from each source of data.  So on the first observation per StateID, the STATE_NAME always comes from TWO, as the values from TWO overwrite the values from ONE.  But after the first observation per StateID, the values from ONE are read in and no longer get overwritten by values from TWO.  So the simple solution would be:

 

data three;

merge one (drop=state_name) two;

by stateID;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 876 views
  • 0 likes
  • 4 in conversation