Hi Everyone,
I have a FULL_data file in which each date has a number of characteristics.
I have another condition file that specifies condition for a give 2 characteristic value.
For any condition in condition file, I want to pull all date from FULL_data that meet this condition.
(in my real work, the condition could be 4 or 5 characteristics )
I don't know how to get that kind of merge.
Any help is very much appreciated.
HHC
The WANT file
condition_ID |
condt1 |
value1 |
cond2 |
value2 |
date |
1 |
a |
4 |
b |
5 |
11 |
1 |
a |
4 |
b |
5 |
12 |
2 |
c |
1 |
d |
2 |
14 |
data FULLDATA;
input date a b c d ;
datalines;
11 4 5 1 1
12 4 5 2 5
13 4 1 2 6
14 8 3 1 2
;run;
data condition;
input condition_ID cond1 $ value1 cond2 $ value2;
datalines;
1 a 4 b 5
2 c 1 d 2
;run;
*************************
Also, I want to have the WANT file for all records that meet 1 of the 2 conditions (OR), please give me a hint if it is not long.
Thank you.
*After getting the condition-combination file,
this code will help to merge the condition-combination file with the Full data (retunr) file
In the output file, each condition has all record that meet it. and can be use to analyze quality of condition;
data FULLDATA;
input date a b c d ;
datalines;
11 4 5 1 1
12 4 5 2 5
13 0 1 2 2
14 8 5 1 2
20 4 5 1 2
;run;
data condition;
input condition_ID cond1 $ value1 cond2 $ value2;
datalines;
1 a 4 b 5
2 c 1 d 2
;run;
******************************************************************************************
*Combination of AND condition1 and condition2;
filename tmp temp;
data _null_;
set condition end=last_cond;
file tmp;
put 'if (' cond1 '=' value1 'AND ' cond2 '=' value2 ')'
' then do; condition_id=' condition_id '; output;end;' ;
*if last_cond=0 then put 'else';
run;
options source2;
data want_and;
set fulldata;
%include tmp;
run;
proc sort data=want_and; by condition_id date;run;
*look into the temfile, the second code is like that;
data want2;
set fulldata;
if (a =4 and b =5) then do; condition_id=1 ; output;end;
if (c =1 and d =2) then do; condition_id=2 ; output;end;
run;
******************************************************************************************
*Combination of OR condition1 and condition2;
filename tmp temp;
data _null_;
set condition end=last_cond;
file tmp;
put 'if (' cond1 '=' value1 'OR ' cond2 '=' value2 ')'
' then do; condition_id=' condition_id '; output;end;' ;
*if last_cond=0 then put 'else';
run;
options source2;
data want_or;
set fulldata;
%include tmp;
run;
proc sort; by condition_id date;run;
*look into the temfile, the second code is like that;
data want2;
set fulldata;
if (a =4 OR b =5) then do; condition_id=1 ; output;end;
if (c =1 OR d =2) then do; condition_id=2 ; output;end;
run;
proc sort; by condition_id date;run;
Use sql "select into :macrovar" to construct the where condition from the CONDITION dataset. Then use it in a data step:
proc sql noprint;
select distinct '(' || catx('=',cond1,value1) || ' and ' || catx('=',cond2,value2) || ')'
into :cond
separated by ' or '
from condition;
quit;
%put &=cond;
data want;
set fulldata;
where &cond ;
run;
And change the "AND" to "OR" in the select distinct expression to satisfy your 2nd requirement.
Use sql "select into :macrovar" to construct the where condition from the CONDITION dataset. Then use it in a data step:
proc sql noprint;
select distinct '(' || catx('=',cond1,value1) || ' and ' || catx('=',cond2,value2) || ')'
into :cond
separated by ' or '
from condition;
quit;
%put &=cond;
data want;
set fulldata;
where &cond ;
run;
And change the "AND" to "OR" in the select distinct expression to satisfy your 2nd requirement.
Thank you for your help.
I wonder if there is any way to get the Condition_ID in the want file?
HHC
Since my condition file is large, I create the below marco to run 1 condition at a time.
I wonder how I can tell SAS to do the: DO i=1 to "END of File" in the first step.
Also, is there any way to include the condition_id from the original file into the Want file?
Thank you for your help.
HHC
%macro cond;
%do i=1 %to 2; *HOW TO MAKE IT: i=1 to END OF FILE ;
data single_condition; set condition;
if _N_=&i;
run;
proc sql ;
select distinct '(' || catx('=',cond1,value1) || ' and ' || catx('=',cond2,value2) || ')'
into :cond
separated by ' or '
from single_condition;
quit;
%put &=cond;
data want_temp;
set fulldata;
where &cond ;
condition_id=&i; *I have to get the Condition_ID manually instead of pulling it from original file;
run;
%if &i=1 %then %do;
data want; set want_temp;
%end;
%else %do;
data want; set want want_temp;
%end;
%end;
%mend;
%cond;run;
If (1) there are too many conditions in the CONDITION datast, and (2) you want to label conditions, then you could forget the SQL approach and use a DATA _NULL_ step to write out a series of statements like
if x1=1 and y1=1 then condition_id=1;
else if x2=12and y2=12then condition_id=2;
for use in the "data want" step. Then a subsetting if "if condition_id>=1" gets you what you want. There is definitely no need to iterate through the conditions, assuming they are mutually exclusive:
filename tmp temp;
data _null_;
set condition end=last_cond;
file tmp;
put 'if ' cond1 '=' value1 'and ' cond2 '=' value2
' then condition_id=' condition_id ';' @;
if last_cond=0 then put 'else';
run;
options source2;
data want;
set fulldata;
%include tmp;
if condition_id >=1;
run;
Notes:
Hi MKeintz,
The current code eliminates duplicate record from Fulldata.
I actually want to have that duplicate for each condition it satisfies.
In the below data, I add the last record (15) and it should be in both condition.
I am not sure how to change your code to get it.
Can you help me?
Thank you,
HCC
data condition;
input condition_ID cond1 :$1. value1 cond2 :$1. value2;
datalines;
1 a 4 b 5
2 c 1 d 2
;run;
data FULLDATA;
input date a b c d ;
datalines;
11 4 5 1 1
12 4 5 2 5
13 4 1 2 6
14 8 3 1 2
15 4 5 1 2
;run;
To get such "duplicates" change these two lines in the first data step:
put 'if ' cond1 '=' value1 'and ' cond2 '=' value2
' then condition_id=' condition_id ';' @;
if last_cond=0 then put 'else';
to this line:
put 'if ' cond1 '=' value1 'and ' cond2 '=' value2
' then do; condition_id=' condition_id '; output;end;' ;
And in the second data step, remove this line:
if condition_id >=1;
Result: if a single record satisfies multiple conditions, it will be output multiple times.
It works now, Mkeintz!!!
it is a powerful approach, I don't know that kind of method before.
I can read the tmp file to see what the "if then" command.
So basically, the %include tmp will turn into this
data want;
set fulldata;
if a =4 and b =5 then condition_id=1 ;if c =1 and d =2 then condition_id=2 ;
run;
But where the Merge happen? I do not see something like that. Where it is built in your code?
data want1;
set fulldata;
if a =4 and b =5 ;
run;
data want2;
set fulldata;
if c =1 and d =2 ;
run;
data want; set want1 want2;run;
xxx
@hhchenfx wrote:
It works now, Mkeintz!!!
it is a powerful approach, I don't know that kind of method before.
I can read the tmp file to see what the "if then" command.
So basically, the %include tmp will turn into this
data want; set fulldata; if a =4 and b =5 then condition_id=1 ;if c =1 and d =2 then condition_id=2 ; run;
But where the Merge happen? I do not see something like that. Where it is built in your code?
data want1; set fulldata; if a =4 and b =5 ; run; data want2; set fulldata; if c =1 and d =2 ; run; data want; set want1 want2;run;
No. Itr will turn into:
data want;
set fulldata;
if a=4 and b=5 then do; condition_id=1; output; end;
if c=1 and d=2 then do; condition_id=2; output; end;
run;
Notice there is an OUTPUT statement for every successful IF test.
that's output command is so good.
Thanks,
HHC
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.