BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LaiQ
Calcite | Level 5

Hi all,

 

I have problems when merging the data as I only get the value once:

(This is a simplified example but I need individual lines of observation, although they are the same. Hence, I created a matchingkey to combine the ID and Types which is a unique key that matches. Modified just indicates that I have modify the original data.)

 

data dataset1;
  input ID Types $ Date Matchingkey $;
  cards;
1  Apple Jan2016 1Apple
1  Apple Jan2016 1Apple
2  Orange . 2Orange
2  Orange . 2Orange
2  Apple Feb2016 2Apple
;

data dataset2;
  input Matchingkey $ Date Modified $;
  cards;
2Orange Jan2016 Y
;

proc sort data=dataset1;
  by Matchingkey;
run;

proc sort data=dataset2;
  by Matchingkey;
run;

data want;
  merge dataset1 (in=a) dataset2;
  by Matchingkey;
  if a;
run;

However, the output I get will be only:

ID Types Date Matchingkey Modified
1 Apple Jan2016 1Apple
1 Apple Jan2016 1Apple
2 Orange Jan2016 2Orange Y
2 Orange . 2Orange Y
2 Apple Feb2016 2Apple

 

 Is there any way to merge the date to both 2Orange?

 

Really appreciate help for this!

Thank you! 😄

1 ACCEPTED SOLUTION

Accepted Solutions
JohnHoughton
Quartz | Level 8

data dataset1;
input ID Types $ Date :monyy7. Matchingkey $;
cards;
1 Apple Jan2016 1Apple
1 Apple Jan2016 1Apple
2 Orange . 2Orange
2 Orange . 2Orange
2 Apple Feb2016 2Apple
;
run;
data dataset2;
input Matchingkey $ Date :monyy7. Modified $;
cards;
2Orange Jan2016 Y
;
run;


proc sql ;
create table want1 as
select id, types, coalesce(d1.date,d2.date) as date format monyy7. , d1.matchingkey
from dataset1 as d1 left outer join dataset2 as d2
on d1.matchingkey=d2.matchingkey;
run;

There is also an advantage of not needing to sort the datasets first.

 

If you wanted to not use the matchingkey variable and have the variables id & types in both
datasets, then you could change the match conditions of the sql procedure to

 

on d1.id=d2.id AND d1.types=d2.types

 

View solution in original post

13 REPLIES 13
JohnHoughton
Quartz | Level 8

data dataset1;
input ID Types $ Date :monyy7. Matchingkey $;
cards;
1 Apple Jan2016 1Apple
1 Apple Jan2016 1Apple
2 Orange . 2Orange
2 Orange . 2Orange
2 Apple Feb2016 2Apple
;
run;
data dataset2;
input Matchingkey $ Date :monyy7. Modified $;
cards;
2Orange Jan2016 Y
;
run;


proc sql ;
create table want1 as
select id, types, coalesce(d1.date,d2.date) as date format monyy7. , d1.matchingkey
from dataset1 as d1 left outer join dataset2 as d2
on d1.matchingkey=d2.matchingkey;
run;

There is also an advantage of not needing to sort the datasets first.

 

If you wanted to not use the matchingkey variable and have the variables id & types in both
datasets, then you could change the match conditions of the sql procedure to

 

on d1.id=d2.id AND d1.types=d2.types

 

LaiQ
Calcite | Level 5

Thank you! Really appreciate it cus I'm not well-versed with SQL but will try it! 😄

JohnHoughton
Quartz | Level 8

Hi @LaiQ

The following will also work, using a data step merge.

I would still prefer to use the proc sql join though.

 

data dataset1;
input ID Types $ Date :monyy7. Matchingkey $;
cards;
1 Apple Jan2016 1Apple
1 Apple Jan2016 1Apple
2 Orange . 2Orange
2 Orange . 2Orange
2 Apple Feb2016 2Apple
;
run;
data dataset2;
input Matchingkey $ Date :monyy7. Modified $;
cards;
2Orange Jan2016 Y
;
run;

proc sort data=dataset1; by matchingkey;
proc sort data=dataset2; by matchingkey;

data want (drop= d1 d2);
merge dataset1 (rename=(date=d1)) dataset2 (rename=(date=d2));
by matchingkey;
date=coalesce(d1,d2);
format date monyy7.;
run;

LaiQ
Calcite | Level 5

Yes, after trying, I think SQL runs much much faster than proc merge. However, will it be able to replace missing values only? Meaning to read that d1.date is missing then will match over, else if d1.date is filled - do not match/join.

Tom
Super User Tom
Super User

That is what the COALESCE() function does. It will take the first non-missing value amoung the list of arguments.

Tom
Super User Tom
Super User

Note there is not any need to combine variables to create a single variable that can be used as a key.

You can just reference both variables in your BY statement.

 

data want ;
  merge one two ;
  by id type ;
run;
LaiQ
Calcite | Level 5

Noted! However, in my actual data, I have 8 variables that needs to match exactly. Hence, it might take a long time for SAS to sort and I have over 8 million observations... Hence, perhaps a matching key will save me time.

Tom
Super User Tom
Super User

It will not take any longer to sort on 8 variables than on one.

You will just have to write more code, espcecially if using PROC SQL since you will need to wite out all of the join conditions.

LaiQ
Calcite | Level 5

Yes, actually I used SQL with 7 AND condition. It does work fine, but just that now all the observations in my want dataset contains MODIFIED = "Y" because it matches all the condition but I want this to apply to only the observations with empty dates. Do I have to extract all the empty dates from the dataset1 first, merged with dataset2, then patch it back to my dataset1? Meaning my dataset2 contains 1Apple as well - as I'm using the full dataset.

Tom
Super User Tom
Super User

Not sure what you mean so let me re-phrase it and then answer.

You want to update one variable, DATE, in the first dataset with another variable, also named DATE, from another dataset. But you only want to use the records from the second dataset that indicate they are modifications, MODIFIED='Y'.  You have 8 key variables, call them ID1-ID8.

 

 

If that is all that you want to do then obviously you only want to read the records from the second dataset that have MODIFIED='Y'.  Not sure why there are any other records there if its purpose is to modify the first dataset but the easiest way to make sure you don't use the other records is to exclude them.

 

So here is data step code that combines the data set and modifes DATE only when the MODIFIED flag is set.  

 

data want ;
  merge one
        two (keep=id1-id8 date modified rename=(date=newdate)
                     where=(modified='Y') )
  ;
  by id1-id8;
  if modified='Y' then date=newdate;
  drop newdate modified;
run;

 

LaiQ
Calcite | Level 5

Ooh, I see. That really explains much clearer. I will go try it out.

Really sorry for the confusion, but you really helped me a lot! Thank you so much!

JohnHoughton
Quartz | Level 8

Adapting the SQL solution for the case to match the conditions that @Tom has suggested

 

 

proc sql ;
create table want1 as 
select d1.id, d1.types, coalesce(d1.date,d2.date) as date format monyy7., modified
from dataset1 as d1 left outer join dataset2 as d2
on d1.id=d2.id and d1.types=d2.types and d2.modified='Y';
run;

 

Or  another alternative

 

proc sql ;
create table want2 as 
select d1.id, d1.types, coalesce(d1.date,d2.date) as date format monyy7., modified
from dataset1 as d1 left outer join dataset2 (where=(modified='Y')) as d2
on d1.id=d2.id and d1.types=d2.types ;
run;
;

 

You also stated that you "want this to apply to only the observations with empty dates" .

So yet another variation would be

 

proc sql ;
create table want3 as 
select d1.id, d1.types, coalesce(d1.date,d2.date) as date format monyy7., modified
from dataset1 as d1 left outer join dataset2 as d2
on d1.id=d2.id and d1.types=d2.types and d1.date=.;
run;

 

 

LaiQ
Calcite | Level 5

Thank you @JohnHoughton! The last one works as what I want it to be! 

Really appreciate your time to get back to me. Thank you! 😄

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1928 views
  • 0 likes
  • 3 in conversation