Hi Everyone,
I have a FULL_data file in which each date has a number of characteristics.
I have another condition file that specifies condition for a give 2 characteristic value.
For any condition in condition file, I want to pull all date from FULL_data that meet this condition.
(in my real work, the condition could be 4 or 5 characteristics )
I don't know how to get that kind of merge.
Any help is very much appreciated.
HHC
The WANT file
condition_ID |
condt1 |
value1 |
cond2 |
value2 |
date |
1 |
a |
4 |
b |
5 |
11 |
1 |
a |
4 |
b |
5 |
12 |
2 |
c |
1 |
d |
2 |
14 |
data FULLDATA;
input date a b c d ;
datalines;
11 4 5 1 1
12 4 5 2 5
13 4 1 2 6
14 8 3 1 2
;run;
data condition;
input condition_ID cond1 $ value1 cond2 $ value2;
datalines;
1 a 4 b 5
2 c 1 d 2
;run;
*************************
Also, I want to have the WANT file for all records that meet 1 of the 2 conditions (OR), please give me a hint if it is not long.
Thank you.
*After getting the condition-combination file,
this code will help to merge the condition-combination file with the Full data (retunr) file
In the output file, each condition has all record that meet it. and can be use to analyze quality of condition;
data FULLDATA;
input date a b c d ;
datalines;
11 4 5 1 1
12 4 5 2 5
13 0 1 2 2
14 8 5 1 2
20 4 5 1 2
;run;
data condition;
input condition_ID cond1 $ value1 cond2 $ value2;
datalines;
1 a 4 b 5
2 c 1 d 2
;run;
******************************************************************************************
*Combination of AND condition1 and condition2;
filename tmp temp;
data _null_;
set condition end=last_cond;
file tmp;
put 'if (' cond1 '=' value1 'AND ' cond2 '=' value2 ')'
' then do; condition_id=' condition_id '; output;end;' ;
*if last_cond=0 then put 'else';
run;
options source2;
data want_and;
set fulldata;
%include tmp;
run;
proc sort data=want_and; by condition_id date;run;
*look into the temfile, the second code is like that;
data want2;
set fulldata;
if (a =4 and b =5) then do; condition_id=1 ; output;end;
if (c =1 and d =2) then do; condition_id=2 ; output;end;
run;
******************************************************************************************
*Combination of OR condition1 and condition2;
filename tmp temp;
data _null_;
set condition end=last_cond;
file tmp;
put 'if (' cond1 '=' value1 'OR ' cond2 '=' value2 ')'
' then do; condition_id=' condition_id '; output;end;' ;
*if last_cond=0 then put 'else';
run;
options source2;
data want_or;
set fulldata;
%include tmp;
run;
proc sort; by condition_id date;run;
*look into the temfile, the second code is like that;
data want2;
set fulldata;
if (a =4 OR b =5) then do; condition_id=1 ; output;end;
if (c =1 OR d =2) then do; condition_id=2 ; output;end;
run;
proc sort; by condition_id date;run;
Use sql "select into :macrovar" to construct the where condition from the CONDITION dataset. Then use it in a data step:
proc sql noprint;
select distinct '(' || catx('=',cond1,value1) || ' and ' || catx('=',cond2,value2) || ')'
into :cond
separated by ' or '
from condition;
quit;
%put &=cond;
data want;
set fulldata;
where &cond ;
run;
And change the "AND" to "OR" in the select distinct expression to satisfy your 2nd requirement.
Use sql "select into :macrovar" to construct the where condition from the CONDITION dataset. Then use it in a data step:
proc sql noprint;
select distinct '(' || catx('=',cond1,value1) || ' and ' || catx('=',cond2,value2) || ')'
into :cond
separated by ' or '
from condition;
quit;
%put &=cond;
data want;
set fulldata;
where &cond ;
run;
And change the "AND" to "OR" in the select distinct expression to satisfy your 2nd requirement.
Thank you for your help.
I wonder if there is any way to get the Condition_ID in the want file?
HHC
Since my condition file is large, I create the below marco to run 1 condition at a time.
I wonder how I can tell SAS to do the: DO i=1 to "END of File" in the first step.
Also, is there any way to include the condition_id from the original file into the Want file?
Thank you for your help.
HHC
%macro cond;
%do i=1 %to 2; *HOW TO MAKE IT: i=1 to END OF FILE ;
data single_condition; set condition;
if _N_=&i;
run;
proc sql ;
select distinct '(' || catx('=',cond1,value1) || ' and ' || catx('=',cond2,value2) || ')'
into :cond
separated by ' or '
from single_condition;
quit;
%put &=cond;
data want_temp;
set fulldata;
where &cond ;
condition_id=&i; *I have to get the Condition_ID manually instead of pulling it from original file;
run;
%if &i=1 %then %do;
data want; set want_temp;
%end;
%else %do;
data want; set want want_temp;
%end;
%end;
%mend;
%cond;run;
If (1) there are too many conditions in the CONDITION datast, and (2) you want to label conditions, then you could forget the SQL approach and use a DATA _NULL_ step to write out a series of statements like
if x1=1 and y1=1 then condition_id=1;
else if x2=12and y2=12then condition_id=2;
for use in the "data want" step. Then a subsetting if "if condition_id>=1" gets you what you want. There is definitely no need to iterate through the conditions, assuming they are mutually exclusive:
filename tmp temp;
data _null_;
set condition end=last_cond;
file tmp;
put 'if ' cond1 '=' value1 'and ' cond2 '=' value2
' then condition_id=' condition_id ';' @;
if last_cond=0 then put 'else';
run;
options source2;
data want;
set fulldata;
%include tmp;
if condition_id >=1;
run;
Notes:
Hi MKeintz,
The current code eliminates duplicate record from Fulldata.
I actually want to have that duplicate for each condition it satisfies.
In the below data, I add the last record (15) and it should be in both condition.
I am not sure how to change your code to get it.
Can you help me?
Thank you,
HCC
data condition;
input condition_ID cond1 :$1. value1 cond2 :$1. value2;
datalines;
1 a 4 b 5
2 c 1 d 2
;run;
data FULLDATA;
input date a b c d ;
datalines;
11 4 5 1 1
12 4 5 2 5
13 4 1 2 6
14 8 3 1 2
15 4 5 1 2
;run;
To get such "duplicates" change these two lines in the first data step:
put 'if ' cond1 '=' value1 'and ' cond2 '=' value2
' then condition_id=' condition_id ';' @;
if last_cond=0 then put 'else';
to this line:
put 'if ' cond1 '=' value1 'and ' cond2 '=' value2
' then do; condition_id=' condition_id '; output;end;' ;
And in the second data step, remove this line:
if condition_id >=1;
Result: if a single record satisfies multiple conditions, it will be output multiple times.
It works now, Mkeintz!!!
it is a powerful approach, I don't know that kind of method before.
I can read the tmp file to see what the "if then" command.
So basically, the %include tmp will turn into this
data want;
set fulldata;
if a =4 and b =5 then condition_id=1 ;if c =1 and d =2 then condition_id=2 ;
run;
But where the Merge happen? I do not see something like that. Where it is built in your code?
data want1;
set fulldata;
if a =4 and b =5 ;
run;
data want2;
set fulldata;
if c =1 and d =2 ;
run;
data want; set want1 want2;run;
xxx
@hhchenfx wrote:
It works now, Mkeintz!!!
it is a powerful approach, I don't know that kind of method before.
I can read the tmp file to see what the "if then" command.
So basically, the %include tmp will turn into this
data want; set fulldata; if a =4 and b =5 then condition_id=1 ;if c =1 and d =2 then condition_id=2 ; run;
But where the Merge happen? I do not see something like that. Where it is built in your code?
data want1; set fulldata; if a =4 and b =5 ; run; data want2; set fulldata; if c =1 and d =2 ; run; data want; set want1 want2;run;
No. Itr will turn into:
data want;
set fulldata;
if a=4 and b=5 then do; condition_id=1; output; end;
if c=1 and d=2 then do; condition_id=2; output; end;
run;
Notice there is an OUTPUT statement for every successful IF test.
that's output command is so good.
Thanks,
HHC
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.