DATA Step, Macro, Functions and more

Merging data based on (variable names from 1 data file) and (variables value from another data file)

Accepted Solution Solved
Reply
Super Contributor
Posts: 420
Accepted Solution

Merging data based on (variable names from 1 data file) and (variables value from another data file)

[ Edited ]

Hi Everyone,

I have a FULL_data file in which each date has a number of characteristics.

I have another condition file that specifies condition for a give 2 characteristic value.

 

For any condition in condition file, I want to pull all date from FULL_data that meet this condition.

(in my real work, the condition could be 4 or 5 characteristics )

 

I don't know how to get that kind of merge.

Any help is very much appreciated.

 

HHC
The WANT file

condition_ID

condt1

value1

cond2

value2

date

1

a

4

b

5

11

1

a

4

b

5

12

2

c

1

d

2

14

 

data FULLDATA;
input date a b c d ;
datalines;
11 4 5 1 1
12 4 5 2 5
13 4 1 2 6
14 8 3 1 2
;run;

 

data condition;
input condition_ID cond1 $ value1 cond2 $ value2;
datalines;
1 a 4 b 5
2 c 1 d 2
;run;

 

*************************

Also, I want to have the WANT file for all records that meet 1 of the 2 conditions (OR), please give me a hint if it is not long.

Thank you.

 


*After getting the condition-combination file,
this code will help to merge the condition-combination file with the Full data (retunr) file

In the output file, each condition has all record that meet it. and can be use to analyze quality of condition;


data FULLDATA; 
input date a b c d ;
datalines;
11 4 5 1 1
12 4 5 2 5
13 0 1 2 2
14 8 5 1 2
20 4 5 1 2
;run;
 
data condition; 
input condition_ID cond1 $ value1 cond2 $ value2;
datalines;
1 a 4 b 5
2 c 1 d 2
;run;

******************************************************************************************
*Combination of AND condition1 and condition2;

filename tmp temp;
data _null_;
  set condition end=last_cond;
  file tmp;
	put 'if (' cond1 '=' value1 'AND ' cond2 '=' value2 ')'
	' then do; condition_id=' condition_id '; output;end;' ;
	*if last_cond=0  then put 'else';
run;

options source2;
data want_and;
  set fulldata;
  %include tmp;
run;
proc sort data=want_and; by condition_id date;run;

		*look into the temfile, the second code is like that;
		data want2;
		  set fulldata;
		if (a =4 and b =5)  then do; condition_id=1 ; output;end;
		if (c =1 and d =2)  then do; condition_id=2 ; output;end;
		run;

******************************************************************************************
*Combination of OR condition1 and condition2;

filename tmp temp;
data _null_;
  set condition end=last_cond;
  file tmp;
	put 'if (' cond1 '=' value1 'OR ' cond2 '=' value2 ')' 
	' then do; condition_id=' condition_id '; output;end;' ;
	*if last_cond=0  then put 'else';
run;


options source2;
data want_or;
  set fulldata;
  %include tmp;
run;
proc sort; by condition_id date;run;

		*look into the temfile, the second code is like that;
		data want2;
		  set fulldata;
		if (a =4 OR b =5)  then do; condition_id=1 ; output;end;
		if (c =1 OR d =2)  then do; condition_id=2 ; output;end;
		run;
		proc sort; by condition_id date;run;

Accepted Solutions
Solution
‎02-07-2017 08:38 PM
Trusted Advisor
Posts: 1,022

Re: Merging data based on (variable names from 1 data file) and (variables value from another data f

Use  sql  "select into :macrovar" to construct the where condition from the CONDITION dataset. Then use it in a data step:

 

proc sql noprint;
   select distinct '(' || catx('=',cond1,value1) || ' and ' || catx('=',cond2,value2) || ')'
     into  :cond
     separated by ' or '
   from condition;
quit;

%put &=cond;
data want;
 set fulldata;
 where &cond ;
run;

 

And change the "AND" to "OR" in the  select distinct expression to satisfy your 2nd requirement.

 

View solution in original post


All Replies
Solution
‎02-07-2017 08:38 PM
Trusted Advisor
Posts: 1,022

Re: Merging data based on (variable names from 1 data file) and (variables value from another data f

Use  sql  "select into :macrovar" to construct the where condition from the CONDITION dataset. Then use it in a data step:

 

proc sql noprint;
   select distinct '(' || catx('=',cond1,value1) || ' and ' || catx('=',cond2,value2) || ')'
     into  :cond
     separated by ' or '
   from condition;
quit;

%put &=cond;
data want;
 set fulldata;
 where &cond ;
run;

 

And change the "AND" to "OR" in the  select distinct expression to satisfy your 2nd requirement.

 

Super Contributor
Posts: 420

Re: Merging data based on (variable names from 1 data file) and (variables value from another data f

Thank you for your help.

 

I wonder if there is any way to get the  Condition_ID in the want file?

 

HHC

Super Contributor
Posts: 420

Re: Merging data based on (variable names from 1 data file) and (variables value from another data f

Since my condition file is large, I create the below marco to run 1 condition at a time.

I wonder how I can tell SAS to do the:  DO i=1 to "END of File" in the first step.

Also, is there any way to include the condition_id from the original file into the Want file?

 

Thank you for your help.

HHC


%macro cond;
	%do i=1 %to 2;	*HOW TO MAKE IT: i=1 to END OF FILE ;
		data single_condition; set condition;
		if _N_=&i;
		run;
 
		proc sql ;
		   select distinct  '(' || catx('=',cond1,value1) || ' and ' || catx('=',cond2,value2) || ')'
		     into  :cond
		     separated by ' or '
		   from single_condition;
		quit;

		%put &=cond;
			data want_temp;
			 set fulldata;
			 where &cond ;
			 condition_id=&i; *I have to get the Condition_ID manually instead of pulling it from original file;
			run;

		%if &i=1 %then %do;
		data want; set want_temp;
		%end;
		%else %do;
		data want; set want want_temp;
		%end;
	%end;
%mend;

%cond;run;
Trusted Advisor
Posts: 1,022

Re: Merging data based on (variable names from 1 data file) and (variables value from another data f

If (1) there are too many conditions in the CONDITION datast, and (2) you want to label conditions, then you could forget the SQL approach and use a DATA _NULL_ step to write out a series of statements like

 

  if   x1=1 and y1=1 then condition_id=1;

  else   if   x2=12and y2=12then condition_id=2;

 

for use in the "data want" step.  Then a subsetting if "if condition_id>=1" gets you what you want.  There is definitely no need to iterate through the conditions, assuming they are mutually exclusive: 

 

filename tmp temp;
data _null_;
  set condition end=last_cond;
  file tmp;
  put 'if ' cond1 '=' value1 'and ' cond2 '=' value2  
      ' then condition_id=' condition_id ';' @;
  if last_cond=0  then put 'else';
run;

options source2;
data want;
  set fulldata;
  %include tmp;
  if condition_id >=1;
run;

 

Notes:

  1. The "filename tmp temp" says to assign the name "TMP" to a temporary file (to be automatically deleted at end of sas session.
  2. The FILE TMP tells SAS that the PUT statements will write to TMP in the first DATA step
  3. The "%include tmp;" statement tells sas to read statement from the TMP file  (and the "options source2" statement tells SAS to print to the log any %INCLUDEd statements, so you can see what it happening.
  4. If a record satisfies multiple conditions, only the first one will be identified in this program.  (You loop program would identify them all - and would also duplicate the corresponding records from FULLDATA.
Super Contributor
Posts: 420

Re: Merging data based on (variable names from 1 data file) and (variables value from another data f

I want to thank you first for details explanation.
I cannot comprehend it as I am on call right now.
I will study it.
HHC
Super Contributor
Posts: 420

Re: Merging data based on (variable names from 1 data file) and (variables value from another data f

Hi MKeintz,

The current code eliminates duplicate record from Fulldata.
I actually want to have that duplicate for each condition it satisfies.

In the below data, I add the last record (15) and it should be in both condition.

I am not sure how to change your code to get it.

Can you help me?

Thank you,

HCC


data condition;
input condition_ID cond1 :$1. value1 cond2 :$1. value2;
datalines;
1 a 4 b 5
2 c 1 d 2
;run;
data FULLDATA;
input date a b c d ;
datalines;
11 4 5 1 1
12 4 5 2 5
13 4 1 2 6
14 8 3 1 2
15 4 5 1 2
;run;

Trusted Advisor
Posts: 1,022

Re: Merging data based on (variable names from 1 data file) and (variables value from another data f

To get such "duplicates" change these two lines in the first data step:

 

put 'if ' cond1 '=' value1 'and ' cond2 '=' value2

   ' then condition_id=' condition_id ';' @;

if last_cond=0 then put 'else';

 

to this line:

 

put 'if ' cond1 '=' value1 'and ' cond2 '=' value2

' then do; condition_id=' condition_id '; output;end;' ;

 

 

And in the second data step, remove this line:

 

if condition_id >=1;

 

Result: if a single record satisfies multiple conditions, it will be output multiple times. 

 

Super Contributor
Posts: 420

Re: Merging data based on (variable names from 1 data file) and (variables value from another data f

It works now, Mkeintz!!!

 

it is a powerful approach, I don't know that kind of method before.

 

I can read the tmp file to see what the "if then" command.

 

So basically, the %include tmp will turn into this 


data want;
  set fulldata;
if a =4 and b =5  then condition_id=1 ;if c =1 and d =2  then condition_id=2 ;
run;

But where the Merge happen? I do not see something like that. Where it is built in your code?


data want1;
  set fulldata;
if  a =4 and b =5 ;
run;

data want2;
  set fulldata;
if  c =1 and d =2  ;
run;

data want; set want1 want2;run;
Trusted Advisor
Posts: 1,022

Re: Merging data based on (variable names from 1 data file) and (variables value from another data f

xxx

 


hhchenfx wrote:

It works now, Mkeintz!!!

 

it is a powerful approach, I don't know that kind of method before.

 

I can read the tmp file to see what the "if then" command.

 

So basically, the %include tmp will turn into this 


data want;
  set fulldata;
if a =4 and b =5  then condition_id=1 ;if c =1 and d =2  then condition_id=2 ;
run;

But where the Merge happen? I do not see something like that. Where it is built in your code?


data want1;
  set fulldata;
if  a =4 and b =5 ;
run;

data want2;
  set fulldata;
if  c =1 and d =2  ;
run;

data want; set want1 want2;run;

 

 

No.  Itr will turn into:

 

data want;

  set fulldata;

  if a=4 and b=5 then do; condition_id=1; output; end;

  if c=1 and d=2 then do; condition_id=2; output; end;

 run;

 

Notice there is an OUTPUT statement for every successful IF test.

Super Contributor
Posts: 420

Re: Merging data based on (variable names from 1 data file) and (variables value from another data f

that's output command is so good.

Thanks,

HHC

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 10 replies
  • 188 views
  • 0 likes
  • 2 in conversation