Hi,
I would want to merge 2 data sets, where in the tables look like this:
table1:
SID CID Flag country
1 2 name US
2 4 name US
5 non-name NZ
7 non-name UK
table 2:
SID CID FLAG Country status
1 2 name US active
2 4 name US active
5 non-name NZ active
7 non-name UK active
10 6 name UK active
so i want to merge the above tables
if SID is not equal to blank or flag is name then merge the table by SID and country
else if SID is equal to blank or flag is non-name then merge the above table by CID and country.
Note:- if there is an SID it will have the flag as 'name'
The data step i am using is very basic and it doesnt work.
data want;
set have1 (keep= SID CID FLAG COUNTRY)
have2 (keep=SID CID FLAG COUNTRY STATUS);
by SID CID COUNTRY
IF FLAG = NAMED output SID;
else output CID;
run;
i wonder if there is a better way to merge with this condition. Appreciate your time and help!
The statements between DO-END creates an instance of a look up table in memory aka Hash memory table. During execution of the datastep, values are loaded from table1 into the LOOK UP table. It's instance is name h. It is of type hash which a non-scalar value or in other words will hold instance pointing to the values from table1 that is loaded.
The find() method is a look up method to look up values in table1 from table2 using key values that match.When the match is found, the dot-notation method results in 0 implicitly for true matches and a non zero values for non-matches.
The condition if h.find()=0 requests for only true matches to be written to the output. In a nutshell, we are doing a lookup operation that's an equivalent of a join.
Hope that helps.
Hi @codyV Can you please post the expected output for the input samples?
Hope this helps
data table1;
input SID CID Flag $ country $;
cards;
1 2 name US
2 4 name US
. 5 non-name NZ
. 7 non-name UK
;
data table2;
input SID CID (FLAG Country status) ($);
cards;
1 2 name US active
2 4 name US active
. 5 non-name NZ active
. 7 non-name UK active
10 6 name UK active
;
proc sql;
create table want as
select a.*,status
from table1 a, table2 b
where (a.SID>. or a.flag='name') and (a.sid=b.sid and a.country=b.country) or
(a.SID=. or a.flag='non-name') and (a.cid=b.cid and a.country=b.country);
quit;
Hello @codyV
On taking a closer look, methinks all variables in Table1 can be used as keys
data table1;
input SID CID Flag $ country $;
cards;
1 2 name US
2 4 name US
. 5 non-name NZ
. 7 non-name UK
;
data table2;
input SID CID (FLAG Country status) ($);
cards;
1 2 name US active
2 4 name US active
. 5 non-name NZ active
. 7 non-name UK active
10 6 name UK active
;
data want ;
if _n_=1 then do;
dcl hash H (dataset:'table1') ;
h.definekey (all:'y') ;
h.definedata (all:'y') ;
h.definedone () ;
end;
set table2;
if h.find()=0;
run;
Hi and welcome to the SAS Communities 🙂
What does your desired result look like? Makes it easier to provide a usable code answer.
Also, if your data is representable, your request is equivalent to merging on CID and Country since they are both present and non missing in both data sets?
Apologies for the mere example given earlier!
This is how the tables look and the expected result in precise:
Table 1:-
SID | CID | FLAG | COUNTRY | |
1 | 3 | name | US | |
2 | name | US | ||
4 | 4 | name | UK | |
6 | non-name | AUS | ||
8 | non-name | NZ | ||
9 | non-name | UK | ||
9 | 7 | name | US | |
10 | non-name | US | ||
Table 2:- | ||||
SID | CID | Flag | COUNTRY | STATUS |
1 | 3 | name | US | active |
2 | name | US | active | |
5 | name | active | ||
6 | non-name | AUS | active | |
7 | name | active | ||
8 | name | active | ||
8 | non-name | NZ | active | |
9 | non-name | UK | active | |
9 | 7 | name | US | active |
Result:- | ||||
SID | CID | Flag | COUNTRY | STATUS |
1 | 3 | name | US | active |
2 | name | US | active | |
6 | non-name | AUS | active | |
8 | non-name | NZ | active | |
9 | non-name | UK | active | |
9 | 7 | name | US | active |
data table1;
infile cards expandtabs truncover;
input SID CID Flag $ country $;
cards;
1 3 name US
2 . name US
4 4 name UK
. 6 non-name AUS
. 8 non-name NZ
. 9 non-name UK
9 7 name US
. 10 non-name US
;
data table2;
infile cards expandtabs truncover;
input SID CID (FLAG Country status) ($);
cards;
1 3 name US active
2 . name US active
5 . name . active
. 6 non-name AUS active
7 . name . active
8 . name . active
. 8 non-name NZ active
. 9 non-name UK active
9 7 name US active
;
data want ;
if _n_=1 then do;
dcl hash H (dataset:'table1') ;
h.definekey (all:'y') ;
h.definedata (all:'y') ;
h.definedone () ;
end;
set table2;
if h.find()=0;
run;
RESULT:
1 3 name US active
2 . name US active
. 6 non-name AUS active
. 8 non-name NZ active
. 9 non-name UK active
9 7 name US active
The statements between DO-END creates an instance of a look up table in memory aka Hash memory table. During execution of the datastep, values are loaded from table1 into the LOOK UP table. It's instance is name h. It is of type hash which a non-scalar value or in other words will hold instance pointing to the values from table1 that is loaded.
The find() method is a look up method to look up values in table1 from table2 using key values that match.When the match is found, the dot-notation method results in 0 implicitly for true matches and a non zero values for non-matches.
The condition if h.find()=0 requests for only true matches to be written to the output. In a nutshell, we are doing a lookup operation that's an equivalent of a join.
Hope that helps.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.