BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ybz12003
Rhodochrosite | Level 12

Hello,

 

I have dataset A with 18000 obs and 900 columns (Name, Status, ect.), and dataset B with 200 obs and two columns (Name and Status).   All the 200 "name" in dataset B come from dataset A, but with new updated "status".  I would like to merge dataset A and B so that I could update the "status" column in dataset A.  In other words, my final purpose is to get full dataset A, with the same 18000 obs and 900 columns, and the "Status" column should include updating "status" column from dataset B of the same "Name".  Please advice how to approach it, thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@ybz12003 wrote:
And I have sorted both datasets too.

And the question about duplicate VALUES?

 

data want;

    merge big small;

    by name;

run;

 

is the basic code.

BUT if Small has a duplicate value for a name and Big does not then you get multiple output records for that name.

If Big has a name value duplicated and Small doesn't then all the duplicated name values in Big will get the same "status" from Small.

If BOTH have duplicates then the result can be very unpredictable and seldom as desired.

View solution in original post

6 REPLIES 6
ballardw
Super User

Does the big data set have duplicates of the same "Name" variable?

If so , do you want to update ALL of the status to the same value?

 

In this situation, then sort both data sets by Name, and merge by Name.

As long as the variable Status is the correct type then what you want will be attempted.

If the status variable is character and has different lengths you will get a warning about such and that you may have truncated data?

 

When you have multiple names in the big set and only want specific records updated from the small set you need to provide additional information to match the data so merge will work correctly.

 

If you have one value in the big set with possibly multiple records with the same name in the small set you would use UPDATE instead of Merge but the data again needs to be sorted by name and then use :

 

data want;

    update bigset smallset;

by name;

The order on the Update statement is important. You can update the Bigset by using that instead of Want.

 

Be aware: update will add records if there are Names in the smallset that do not match exactly with Names in the big set.

 

Does the small data set have duplicates of the "Name" variable? If so you may need to provide additional information on how to correctly apply the merge.

ybz12003
Rhodochrosite | Level 12

Both datasets have the same variable names, and same length and same characters. 

ybz12003
Rhodochrosite | Level 12
And I have sorted both datasets too.
ballardw
Super User

@ybz12003 wrote:
And I have sorted both datasets too.

And the question about duplicate VALUES?

 

data want;

    merge big small;

    by name;

run;

 

is the basic code.

BUT if Small has a duplicate value for a name and Big does not then you get multiple output records for that name.

If Big has a name value duplicated and Small doesn't then all the duplicated name values in Big will get the same "status" from Small.

If BOTH have duplicates then the result can be very unpredictable and seldom as desired.

ybz12003
Rhodochrosite | Level 12

When I sorted, I use nodup function to remove both duplicated name in both datasets.

Reeza
Super User
Have you tried the suggested solutions to see if they work for you? I would also recommend generating a Minimum Complete Example that covers as many scenarios you can think of and test it first. Then apply it to your actual data.

Either way, a merge is a backup that is guaranteed to work.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 647 views
  • 0 likes
  • 3 in conversation