BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
codyV
Fluorite | Level 6

Hi,

 

I would want to merge 2 data sets, where in the tables look like this:

table1:

SID       CID      Flag              country

1            2         name              US

2            4         name              US

              5         non-name       NZ

              7         non-name       UK

 

table 2:

SID       CID     FLAG             Country          status

1            2         name              US                active

2            4         name              US                active

              5         non-name       NZ                active

              7         non-name       UK                active

10          6         name               UK               active

 

 

so i want to merge the above tables 

if SID is not equal to blank or flag is name then merge the table by SID and country

else if SID is equal to blank or flag is non-name then merge the above table by CID and country.

 

Note:- if there is an SID it will have the flag as 'name'

 

The data step i am using is very basic and it doesnt work.

data want;

set have1 (keep= SID CID FLAG COUNTRY)

have2 (keep=SID CID FLAG COUNTRY STATUS);

by SID CID COUNTRY

IF FLAG = NAMED output SID;

else output CID;

run;

 

i wonder if there is a better way to merge with this condition. Appreciate your time and help!

1 ACCEPTED SOLUTION

Accepted Solutions
novinosrin
Tourmaline | Level 20

The statements between DO-END creates an instance of a look up table in memory aka Hash memory table. During execution of the datastep, values are loaded from table1 into the LOOK UP table. It's instance is name h. It is of type hash which a non-scalar value or in other words will hold instance pointing to the values from table1 that is loaded.

 

The find() method is a look up method to look up values in table1 from table2 using key values that match.When the match is found, the dot-notation method results in 0 implicitly for true matches and a non zero values for non-matches

 

The condition if h.find()=0 requests for only true matches to be written to the output. In a nutshell, we are doing a lookup operation that's an equivalent of a join.

 

Hope that helps. 

View solution in original post

10 REPLIES 10
novinosrin
Tourmaline | Level 20

Hi @codyV   Can you please post the expected output for the input samples?

codyV
Fluorite | Level 6
Accepted Output:
SID CID Flag country status
1 2 name US active
2 4 name US active
5 non-name NZ active
7 non-name UK active

I need to merge the data by SID for the flag- name which is the first priority!
Appreciate your help!
novinosrin
Tourmaline | Level 20

Hope this helps

 

data table1;
input SID       CID      Flag   $           country $;
cards;
1            2         name              US
2            4         name              US
.              5         non-name       NZ
.              7         non-name       UK
;

data table2;
input SID       CID     (FLAG             Country          status) ($);
cards;
1            2         name              US                active
2            4         name              US                active
.              5         non-name       NZ                active
 .             7         non-name       UK                active
10          6         name               UK               active
;

proc sql;
create table want as
select a.*,status
from table1 a, table2 b
where (a.SID>. or a.flag='name') and (a.sid=b.sid and a.country=b.country) or
(a.SID=. or a.flag='non-name') and (a.cid=b.cid and a.country=b.country);
quit;
novinosrin
Tourmaline | Level 20

Hello @codyV   

 

On taking a closer look, methinks all variables in Table1  can be used as keys

 

data table1;
input SID       CID      Flag   $           country $;
cards;
1            2         name              US
2            4         name              US
.              5         non-name       NZ
.              7         non-name       UK
;

data table2;
input SID       CID     (FLAG             Country          status) ($);
cards;
1            2         name              US                active
2            4         name              US                active
.              5         non-name       NZ                active
 .             7         non-name       UK                active
10          6         name               UK               active
;


data want ;
if _n_=1 then do;
   dcl hash H (dataset:'table1') ;
   h.definekey  (all:'y') ;
   h.definedata (all:'y') ;
   h.definedone () ;
end;
set table2;
if h.find()=0;
run;
codyV
Fluorite | Level 6
Proc sql did work but it has been taking too much time to run the data behind....! Many Thanks & appreciate your support
PeterClemmensen
Tourmaline | Level 20

Hi and welcome to the SAS Communities 🙂

 

What does your desired result look like? Makes it easier to provide a usable code answer. 

 

Also, if your data is representable, your request is equivalent to merging on CID and Country since they are both present and non missing in both data sets?

codyV
Fluorite | Level 6

Apologies for the mere example given earlier!

 

This is how the tables look and the expected result in precise:

 

Table 1:-

SIDCID FLAG COUNTRY 
13nameUS 
2 nameUS 
44nameUK 
 6non-nameAUS 
 8non-nameNZ 
 9non-nameUK 
97nameUS 
 10non-nameUS 
Table 2:-     
SID  CIDFlagCOUNTRYSTATUS
13nameUSactive
2 nameUSactive
5 name active
 6non-nameAUSactive
7 name active
8 name active
 8non-nameNZactive
 9non-nameUKactive
97nameUSactive
     
     
Result:-     
SID  CIDFlagCOUNTRYSTATUS
13nameUSactive
2 nameUSactive
 6non-nameAUSactive
 8non-nameNZactive
 9non-nameUKactive
97nameUSactive

 

novinosrin
Tourmaline | Level 20
data table1;
infile cards expandtabs truncover;
input SID       CID      Flag   $           country $;
cards;
1	3	name	US	 
2	 .	name	US	 
4	4	name	UK	 
 .	6	non-name	AUS	 
 .	8	non-name	NZ	 
 .	9	non-name	UK	 
9	7	name	US	 
 .	10	non-name	US
;

data table2;
infile cards expandtabs truncover;
input SID       CID     (FLAG             Country          status) ($);
cards;
1	3	name	US	active
2	. 	name	US	active
5	. 	name	 .	active
. 	6	non-name	AUS	active
7	. 	name	 .	active
8	. 	name	 .	active
. 	8	non-name	NZ	active
. 	9	non-name	UK	active
9	7	name	US	active
;


data want ;
if _n_=1 then do;
   dcl hash H (dataset:'table1') ;
   h.definekey  (all:'y') ;
   h.definedata (all:'y') ;
   h.definedone () ;
end;
set table2;
if h.find()=0;
run;

RESULT:

 

1 3 name US active
2 . name US active
. 6 non-name AUS active
. 8 non-name NZ active
. 9 non-name UK active
9 7 name US active

codyV
Fluorite | Level 6
if you help me understand the do statement bit, as to how will it help me provide the expected result
novinosrin
Tourmaline | Level 20

The statements between DO-END creates an instance of a look up table in memory aka Hash memory table. During execution of the datastep, values are loaded from table1 into the LOOK UP table. It's instance is name h. It is of type hash which a non-scalar value or in other words will hold instance pointing to the values from table1 that is loaded.

 

The find() method is a look up method to look up values in table1 from table2 using key values that match.When the match is found, the dot-notation method results in 0 implicitly for true matches and a non zero values for non-matches

 

The condition if h.find()=0 requests for only true matches to be written to the output. In a nutshell, we are doing a lookup operation that's an equivalent of a join.

 

Hope that helps. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 1841 views
  • 4 likes
  • 3 in conversation