Re: Merge, not delete, duplicates.

dereck255 · Posted 01-27-2015 05:07 PM

Hello,

I have a data set that has a large about of information between two systems. The observations in the data include a Name, an ID1, ID2, Cat1, Cat2, Cat3, Date in System1, Date in System2.

I expect all entries to have dates in both systems. Right now, each "Name" is listed twice:

Name	ID1	ID2	Cat1	Cat2	Cat3	DateSys1	DateSys2
Adam	1001	2002	A	B	C	11Jan2015	.
Adam	1001	2002	A	B	C	.	12Jan2015

What I would like to see is:

Name	ID1	ID2	Cat1	Cat2	Cat3	DateSys1	DateSys2
Adam	1001	2002	A	B	C	11Jan2015	12Jan2015

Where the duplicate entry is merged based on the observations being equal in Name, ID1, ID2, Cat1, Cat2, Cat3.

Help? Thank you!

ballardw · Posted 01-27-2015 05:29 PM

How did you merge the data?

dereck255 · Posted 01-27-2015 06:55 PM

I merged the data sets by Name, ID1, ID2, Cat1, Cat2, Cat3 but it didn't exactly work. So how can I do it after the datasets merged?

ballardw · Posted 01-27-2015 07:30 PM

Code please. Merge using data step or proc sql? Are the combinations of your key variables repeated in either or both data sets?

Were there any warning messages?

And probably best to go back to the step that combined them then try to fix other stuff later.

dereck255 · Posted 01-27-2015 07:46 PM

I merged by a data merge by statement.

I'll have to double check, but there were no errors. I confirmed that I sorted both data sets by the same variables that I merged them by. I think the issue lies around one of the cat variables - I had to remove a character string from one of the data sets so that the names matched across data sets. When I merged by that cat variable, I get the issue that I'm having now (see original post).

I'm not familiar with update. What exactly does it do?

Thank you for your help!

Reeza · Posted 01-27-2015 08:01 PM

The 'trick' is having the same data set as the master and transaction dataset.

From the documentation:

In a DATA step, the UPDATE statement reads observations from the transaction data set and updates corresponding observations (observations with the same value of all BY variables) from the master data set. All nonmissing values for variables in the transaction data set replace the corresponding values that are read from the master data set. SAS writes the modified observations to the data set that you name in the DATA statement without modifying either the master or the transaction data set.

Step-by-Step Programming with Base SAS(R) Software

Reeza · Posted 01-27-2015 07:33 PM

You can use the update trick instead:

data want;

update have(obs=0) have;

by name id1 id2 cat1 cat2 cat3;

run;

You may want to figure out why your merge didn't work though, because it is one way to do it.

dereck255 · Posted 01-27-2015 10:43 PM

I figured it out! I had to trim some values in the one data set - here it was leaving extra spaces. I used TRIM and now when I merge, it merges great!

Thanks for your help.

SAS Training: Just a Click Away