Hello,
I have a data set that has a large about of information between two systems. The observations in the data include a Name, an ID1, ID2, Cat1, Cat2, Cat3, Date in System1, Date in System2.
I expect all entries to have dates in both systems. Right now, each "Name" is listed twice:
Name | ID1 | ID2 | Cat1 | Cat2 | Cat3 | DateSys1 | DateSys2 |
---|---|---|---|---|---|---|---|
Adam | 1001 | 2002 | A | B | C | 11Jan2015 | . |
Adam | 1001 | 2002 | A | B | C | . | 12Jan2015 |
What I would like to see is:
Name | ID1 | ID2 | Cat1 | Cat2 | Cat3 | DateSys1 | DateSys2 |
---|---|---|---|---|---|---|---|
Adam | 1001 | 2002 | A | B | C | 11Jan2015 | 12Jan2015 |
Where the duplicate entry is merged based on the observations being equal in Name, ID1, ID2, Cat1, Cat2, Cat3.
Help? Thank you!
How did you merge the data?
I merged the data sets by Name, ID1, ID2, Cat1, Cat2, Cat3 but it didn't exactly work. So how can I do it after the datasets merged?
Code please. Merge using data step or proc sql? Are the combinations of your key variables repeated in either or both data sets?
Were there any warning messages?
And probably best to go back to the step that combined them then try to fix other stuff later.
I merged by a data merge by statement.
I'll have to double check, but there were no errors. I confirmed that I sorted both data sets by the same variables that I merged them by. I think the issue lies around one of the cat variables - I had to remove a character string from one of the data sets so that the names matched across data sets. When I merged by that cat variable, I get the issue that I'm having now (see original post).
I'm not familiar with update. What exactly does it do?
Thank you for your help!
The 'trick' is having the same data set as the master and transaction dataset.
From the documentation:
In a DATA step, the UPDATE statement reads observations from the transaction data set and updates corresponding observations (observations with the same value of all BY variables) from the master data set. All nonmissing values for variables in the transaction data set replace the corresponding values that are read from the master data set. SAS writes the modified observations to the data set that you name in the DATA statement without modifying either the master or the transaction data set.
You can use the update trick instead:
data want;
update have(obs=0) have;
by name id1 id2 cat1 cat2 cat3;
run;
You may want to figure out why your merge didn't work though, because it is one way to do it.
I figured it out! I had to trim some values in the one data set - here it was leaving extra spaces. I used TRIM and now when I merge, it merges great!
Thanks for your help.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.