DATA Step, Macro, Functions and more

Merging with same value

Accepted Solution Solved
Reply
Contributor
Posts: 31
Accepted Solution

Merging with same value

[ Edited ]

Hi all,

 

I have problems when merging the data as I only get the value once:

(This is a simplified example but I need individual lines of observation, although they are the same. Hence, I created a matchingkey to combine the ID and Types which is a unique key that matches. Modified just indicates that I have modify the original data.)

 

data dataset1;
  input ID Types $ Date Matchingkey $;
  cards;
1  Apple Jan2016 1Apple
1  Apple Jan2016 1Apple
2  Orange . 2Orange
2  Orange . 2Orange
2  Apple Feb2016 2Apple
;

data dataset2;
  input Matchingkey $ Date Modified $;
  cards;
2Orange Jan2016 Y
;

proc sort data=dataset1;
  by Matchingkey;
run;

proc sort data=dataset2;
  by Matchingkey;
run;

data want;
  merge dataset1 (in=a) dataset2;
  by Matchingkey;
  if a;
run;

However, the output I get will be only:

ID Types Date Matchingkey Modified
1 Apple Jan2016 1Apple
1 Apple Jan2016 1Apple
2 Orange Jan2016 2Orange Y
2 Orange . 2Orange Y
2 Apple Feb2016 2Apple

 

 Is there any way to merge the date to both 2Orange?

 

Really appreciate help for this!

Thank you! Smiley Very Happy


Accepted Solutions
Solution
‎07-05-2016 07:50 AM
Contributor
Posts: 41

Re: Merging with same value

data dataset1;
input ID Types $ Date :monyy7. Matchingkey $;
cards;
1 Apple Jan2016 1Apple
1 Apple Jan2016 1Apple
2 Orange . 2Orange
2 Orange . 2Orange
2 Apple Feb2016 2Apple
;
run;
data dataset2;
input Matchingkey $ Date :monyy7. Modified $;
cards;
2Orange Jan2016 Y
;
run;


proc sql ;
create table want1 as
select id, types, coalesce(d1.date,d2.date) as date format monyy7. , d1.matchingkey
from dataset1 as d1 left outer join dataset2 as d2
on d1.matchingkey=d2.matchingkey;
run;

There is also an advantage of not needing to sort the datasets first.

 

If you wanted to not use the matchingkey variable and have the variables id & types in both
datasets, then you could change the match conditions of the sql procedure to

 

on d1.id=d2.id AND d1.types=d2.types

 

View solution in original post


All Replies
Solution
‎07-05-2016 07:50 AM
Contributor
Posts: 41

Re: Merging with same value

data dataset1;
input ID Types $ Date :monyy7. Matchingkey $;
cards;
1 Apple Jan2016 1Apple
1 Apple Jan2016 1Apple
2 Orange . 2Orange
2 Orange . 2Orange
2 Apple Feb2016 2Apple
;
run;
data dataset2;
input Matchingkey $ Date :monyy7. Modified $;
cards;
2Orange Jan2016 Y
;
run;


proc sql ;
create table want1 as
select id, types, coalesce(d1.date,d2.date) as date format monyy7. , d1.matchingkey
from dataset1 as d1 left outer join dataset2 as d2
on d1.matchingkey=d2.matchingkey;
run;

There is also an advantage of not needing to sort the datasets first.

 

If you wanted to not use the matchingkey variable and have the variables id & types in both
datasets, then you could change the match conditions of the sql procedure to

 

on d1.id=d2.id AND d1.types=d2.types

 

Contributor
Posts: 31

Re: Merging with same value

Posted in reply to JohnHoughton

Thank you! Really appreciate it cus I'm not well-versed with SQL but will try it! Smiley Very Happy

Contributor
Posts: 41

Re: Merging with same value

Hi @LaiQ

The following will also work, using a data step merge.

I would still prefer to use the proc sql join though.

 

data dataset1;
input ID Types $ Date :monyy7. Matchingkey $;
cards;
1 Apple Jan2016 1Apple
1 Apple Jan2016 1Apple
2 Orange . 2Orange
2 Orange . 2Orange
2 Apple Feb2016 2Apple
;
run;
data dataset2;
input Matchingkey $ Date :monyy7. Modified $;
cards;
2Orange Jan2016 Y
;
run;

proc sort data=dataset1; by matchingkey;
proc sort data=dataset2; by matchingkey;

data want (drop= d1 d2);
merge dataset1 (rename=(date=d1)) dataset2 (rename=(date=d2));
by matchingkey;
date=coalesce(d1,d2);
format date monyy7.;
run;

Contributor
Posts: 31

Re: Merging with same value

Posted in reply to JohnHoughton

Yes, after trying, I think SQL runs much much faster than proc merge. However, will it be able to replace missing values only? Meaning to read that d1.date is missing then will match over, else if d1.date is filled - do not match/join.

Super User
Super User
Posts: 7,074

Re: Merging with same value

That is what the COALESCE() function does. It will take the first non-missing value amoung the list of arguments.

Super User
Super User
Posts: 7,074

Re: Merging with same value

Note there is not any need to combine variables to create a single variable that can be used as a key.

You can just reference both variables in your BY statement.

 

data want ;
  merge one two ;
  by id type ;
run;
Contributor
Posts: 31

Re: Merging with same value

Noted! However, in my actual data, I have 8 variables that needs to match exactly. Hence, it might take a long time for SAS to sort and I have over 8 million observations... Hence, perhaps a matching key will save me time.

Super User
Super User
Posts: 7,074

Re: Merging with same value

It will not take any longer to sort on 8 variables than on one.

You will just have to write more code, espcecially if using PROC SQL since you will need to wite out all of the join conditions.

Contributor
Posts: 31

Re: Merging with same value

[ Edited ]

Yes, actually I used SQL with 7 AND condition. It does work fine, but just that now all the observations in my want dataset contains MODIFIED = "Y" because it matches all the condition but I want this to apply to only the observations with empty dates. Do I have to extract all the empty dates from the dataset1 first, merged with dataset2, then patch it back to my dataset1? Meaning my dataset2 contains 1Apple as well - as I'm using the full dataset.

Super User
Super User
Posts: 7,074

Re: Merging with same value

Not sure what you mean so let me re-phrase it and then answer.

You want to update one variable, DATE, in the first dataset with another variable, also named DATE, from another dataset. But you only want to use the records from the second dataset that indicate they are modifications, MODIFIED='Y'.  You have 8 key variables, call them ID1-ID8.

 

 

If that is all that you want to do then obviously you only want to read the records from the second dataset that have MODIFIED='Y'.  Not sure why there are any other records there if its purpose is to modify the first dataset but the easiest way to make sure you don't use the other records is to exclude them.

 

So here is data step code that combines the data set and modifes DATE only when the MODIFIED flag is set.  

 

data want ;
  merge one
        two (keep=id1-id8 date modified rename=(date=newdate)
                     where=(modified='Y') )
  ;
  by id1-id8;
  if modified='Y' then date=newdate;
  drop newdate modified;
run;

 

Contributor
Posts: 31

Re: Merging with same value

Ooh, I see. That really explains much clearer. I will go try it out.

Really sorry for the confusion, but you really helped me a lot! Thank you so much!

Contributor
Posts: 41

Re: Merging with same value

Adapting the SQL solution for the case to match the conditions that @Tom has suggested

 

 

proc sql ;
create table want1 as 
select d1.id, d1.types, coalesce(d1.date,d2.date) as date format monyy7., modified
from dataset1 as d1 left outer join dataset2 as d2
on d1.id=d2.id and d1.types=d2.types and d2.modified='Y';
run;

 

Or  another alternative

 

proc sql ;
create table want2 as 
select d1.id, d1.types, coalesce(d1.date,d2.date) as date format monyy7., modified
from dataset1 as d1 left outer join dataset2 (where=(modified='Y')) as d2
on d1.id=d2.id and d1.types=d2.types ;
run;
;

 

You also stated that you "want this to apply to only the observations with empty dates" .

So yet another variation would be

 

proc sql ;
create table want3 as 
select d1.id, d1.types, coalesce(d1.date,d2.date) as date format monyy7., modified
from dataset1 as d1 left outer join dataset2 as d2
on d1.id=d2.id and d1.types=d2.types and d1.date=.;
run;

 

 

Contributor
Posts: 31

Re: Merging with same value

Posted in reply to JohnHoughton

Thank you @JohnHoughton! The last one works as what I want it to be! 

Really appreciate your time to get back to me. Thank you! Smiley Very Happy

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 653 views
  • 0 likes
  • 3 in conversation