BookmarkSubscribeRSS Feed
dereck255
Fluorite | Level 6

Hello,

I have a data set that has a large about of information between two systems. The observations in the data include a Name, an ID1, ID2, Cat1, Cat2, Cat3, Date in System1, Date in System2.

I expect all entries to have dates in both systems. Right now, each "Name" is listed twice:

NameID1ID2Cat1Cat2Cat3DateSys1DateSys2
Adam10012002ABC11Jan2015.
Adam10012002ABC.12Jan2015

What I would like to see is:

NameID1ID2Cat1Cat2Cat3DateSys1DateSys2
Adam10012002ABC11Jan201512Jan2015

Where the duplicate entry is merged based on the observations being equal in Name, ID1, ID2, Cat1, Cat2, Cat3.

Help? Thank you!

7 REPLIES 7
ballardw
Super User

How did you merge the data?

dereck255
Fluorite | Level 6

I merged the data sets by Name, ID1, ID2, Cat1, Cat2, Cat3 but it didn't exactly work. So how can I do it after the datasets merged?

ballardw
Super User

Code please. Merge using data step or proc sql? Are the combinations of your key variables repeated in either or both data sets?

Were there any warning messages?

And probably best to go back to the step that combined them then try to fix other stuff later.

dereck255
Fluorite | Level 6

I merged by a data merge by statement.

I'll have to double check, but there were no errors. I confirmed that I sorted both data sets by the same variables that I merged them by. I think the issue lies around one of the cat variables - I had to remove a character string from one of the data sets so that the names matched across data sets. When I merged by that cat variable, I get the issue that I'm having now (see original post).

I'm not familiar with update. What exactly does it do?

Thank you for your help!

Reeza
Super User

The 'trick' is having the same data set as the master and transaction dataset.

From the documentation:


In a DATA step, the UPDATE statement reads observations from the transaction data set and updates corresponding observations (observations with the same value of all BY variables) from the master data set. All nonmissing values for variables in the transaction data set replace the corresponding values that are read from the master data set. SAS writes the modified observations to the data set that you name in the DATA statement without modifying either the master or the transaction data set.

Step-by-Step Programming with Base SAS(R) Software

Reeza
Super User

You can use the update trick instead:

data want;

update have(obs=0) have;

by name id1 id2 cat1 cat2 cat3;

run;

You may want to figure out why your merge didn't work though, because it is one way to do it.

dereck255
Fluorite | Level 6

I figured it out! I had to trim some values in the one data set - here it was leaving extra spaces. I used TRIM and now when I merge, it merges great!

Thanks for your help.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2140 views
  • 0 likes
  • 3 in conversation