Solved: How to merge data sets based on a if statement?

codyV · Posted 08-23-2019 08:45 AM

Hi,

I would want to merge 2 data sets, where in the tables look like this:

table1:

SID CID Flag country

1 2 name US

2 4 name US

5 non-name NZ

7 non-name UK

table 2:

SID CID FLAG Country status

1 2 name US active

2 4 name US active

5 non-name NZ active

7 non-name UK active

10 6 name UK active

so i want to merge the above tables

if SID is not equal to blank or flag is name then merge the table by SID and country

else if SID is equal to blank or flag is non-name then merge the above table by CID and country.

Note:- if there is an SID it will have the flag as 'name'

The data step i am using is very basic and it doesnt work.

data want;

set have1 (keep= SID CID FLAG COUNTRY)

have2 (keep=SID CID FLAG COUNTRY STATUS);

by SID CID COUNTRY

IF FLAG = NAMED output SID;

else output CID;

run;

i wonder if there is a better way to merge with this condition. Appreciate your time and help!

novinosrin · Posted 08-25-2019 05:04 PM

The statements between DO-END creates an instance of a look up table in memory aka Hash memory table. During execution of the datastep, values are loaded from table1 into the LOOK UP table. It's instance is name h. It is of type hash which a non-scalar value or in other words will hold instance pointing to the values from table1 that is loaded.

The find() method is a look up method to look up values in table1 from table2 using key values that match.When the match is found, the dot-notation method results in 0 implicitly for true matches and a non zero values for non-matches.

The condition if h.find()=0 requests for only true matches to be written to the output. In a nutshell, we are doing a lookup operation that's an equivalent of a join.

Hope that helps.

View solution in original post

novinosrin · Posted 08-23-2019 08:47 AM

Hi @codyV Can you please post the expected output for the input samples?

codyV · Posted 08-23-2019 08:53 AM

Accepted Output:
SID CID Flag country status
1 2 name US active
2 4 name US active
5 non-name NZ active
7 non-name UK active

I need to merge the data by SID for the flag- name which is the first priority!
Appreciate your help!

novinosrin · Posted 08-23-2019 09:00 AM

Hope this helps

data table1;
input SID       CID      Flag   $           country $;
cards;
1            2         name              US
2            4         name              US
.              5         non-name       NZ
.              7         non-name       UK
;

data table2;
input SID       CID     (FLAG             Country          status) ($);
cards;
1            2         name              US                active
2            4         name              US                active
.              5         non-name       NZ                active
 .             7         non-name       UK                active
10          6         name               UK               active
;

proc sql;
create table want as
select a.*,status
from table1 a, table2 b
where (a.SID>. or a.flag='name') and (a.sid=b.sid and a.country=b.country) or
(a.SID=. or a.flag='non-name') and (a.cid=b.cid and a.country=b.country);
quit;

novinosrin · Posted 08-23-2019 09:10 AM

Hello @codyV

On taking a closer look, methinks all variables in Table1 can be used as keys

data table1;
input SID       CID      Flag   $           country $;
cards;
1            2         name              US
2            4         name              US
.              5         non-name       NZ
.              7         non-name       UK
;

data table2;
input SID       CID     (FLAG             Country          status) ($);
cards;
1            2         name              US                active
2            4         name              US                active
.              5         non-name       NZ                active
 .             7         non-name       UK                active
10          6         name               UK               active
;


data want ;
if _n_=1 then do;
   dcl hash H (dataset:'table1') ;
   h.definekey  (all:'y') ;
   h.definedata (all:'y') ;
   h.definedone () ;
end;
set table2;
if h.find()=0;
run;

codyV · Posted 08-25-2019 04:45 PM

Proc sql did work but it has been taking too much time to run the data behind....! Many Thanks & appreciate your support

PeterClemmensen · Posted 08-23-2019 08:54 AM

Hi and welcome to the SAS Communities 🙂

What does your desired result look like? Makes it easier to provide a usable code answer.

Also, if your data is representable, your request is equivalent to merging on CID and Country since they are both present and non missing in both data sets?

The DATA to DATA Step Macro
Blog: SASnrd

codyV · Posted 08-23-2019 09:09 AM

Apologies for the mere example given earlier!

This is how the tables look and the expected result in precise:

Table 1:-

SID	CID	FLAG	COUNTRY
1	3	name	US
2		name	US
4	4	name	UK
	6	non-name	AUS
	8	non-name	NZ
	9	non-name	UK
9	7	name	US
	10	non-name	US
Table 2:-
SID	CID	Flag	COUNTRY	STATUS
1	3	name	US	active
2		name	US	active
5		name		active
	6	non-name	AUS	active
7		name		active
8		name		active
	8	non-name	NZ	active
	9	non-name	UK	active
9	7	name	US	active


Result:-
SID	CID	Flag	COUNTRY	STATUS
1	3	name	US	active
2		name	US	active
	6	non-name	AUS	active
	8	non-name	NZ	active
	9	non-name	UK	active
9	7	name	US	active

novinosrin · Posted 08-23-2019 09:18 AM

data table1;
infile cards expandtabs truncover;
input SID       CID      Flag   $           country $;
cards;
1	3	name	US	 
2	 .	name	US	 
4	4	name	UK	 
 .	6	non-name	AUS	 
 .	8	non-name	NZ	 
 .	9	non-name	UK	 
9	7	name	US	 
 .	10	non-name	US
;

data table2;
infile cards expandtabs truncover;
input SID       CID     (FLAG             Country          status) ($);
cards;
1	3	name	US	active
2	. 	name	US	active
5	. 	name	 .	active
. 	6	non-name	AUS	active
7	. 	name	 .	active
8	. 	name	 .	active
. 	8	non-name	NZ	active
. 	9	non-name	UK	active
9	7	name	US	active
;


data want ;
if _n_=1 then do;
   dcl hash H (dataset:'table1') ;
   h.definekey  (all:'y') ;
   h.definedata (all:'y') ;
   h.definedone () ;
end;
set table2;
if h.find()=0;
run;

RESULT:

1 3 name US active
2 . name US active
. 6 non-name AUS active
. 8 non-name NZ active
. 9 non-name UK active
9 7 name US active

codyV · Posted 08-25-2019 04:48 PM

if you help me understand the do statement bit, as to how will it help me provide the expected result

novinosrin · Posted 08-25-2019 05:04 PM

The statements between DO-END creates an instance of a look up table in memory aka Hash memory table. During execution of the datastep, values are loaded from table1 into the LOOK UP table. It's instance is name h. It is of type hash which a non-scalar value or in other words will hold instance pointing to the values from table1 that is loaded.

The find() method is a look up method to look up values in table1 from table2 using key values that match.When the match is found, the dot-notation method results in 0 implicitly for true matches and a non zero values for non-matches.

The condition if h.find()=0 requests for only true matches to be written to the output. In a nutshell, we are doing a lookup operation that's an equivalent of a join.

Hope that helps.

How to merge data sets based on a if statement?

Re: How to merge data sets based on a if statement?

Re: How to merge data sets based on a if statement?

Re: How to merge data sets based on a if statement?

Re: How to merge data sets based on a if statement?

Re: How to merge data sets based on a if statement?

Re: How to merge data sets based on a if statement?

Re: How to merge data sets based on a if statement?

Re: How to merge data sets based on a if statement?

Re: How to merge data sets based on a if statement?

Re: How to merge data sets based on a if statement?

Re: How to merge data sets based on a if statement?

SAS Innovate 2025: Save the Date