Hi Everyone,
I have 2 data files, one is original and one is list of condition/filter.
The original has id, target (take value 0 or 1) and a number of variables.
The filter list the condition for a given set of variables(say var1 =0 and var2=1).
I want to see how many record in the original file having target =1 and how many having target =0
IF filter1 OR filter2 is met.
So for that combination of filter1 OR filter2, there are 2 record with target=0 and 1 record with target=1.
I intend to create combinations of up to 4 filters
I want the output file look like:
1st_filter 2nd_filter 3rd_filter 4th_Fitler N_target0 Ntarget1
1 2 . . 2 1
It is so challenging and I still cannot handle it.
Any help is very much appreciate.
(I personally prefer to use the DO LOOP approach)
Many thanks.
data originial;
input id target var1 var2 var3 var4 var5;
datalines;
1 0 0 1 2 4 0
2 0 0 1 2 4 1
3 1 0 1 4 6 0
4 1 0 8 6 9 8
5 1 1 3 6 8 11
6 1 8 6 5 3 4
7 0 9 6 5 3 4
;
data filter;
input filter_id cond1 $ value1 cond2 $ value2 cond3 $ value3;
datalines;
1 var1 0 var2 1 var3 2
2 var1 0 var5 0 . .
3 var2 1 var4 4 . .
4 var4 3 var5 4 var1 6
5 var1 3 var5 4 var1 6
;
...
WOW. So it is going to be cond1=value1 and cond2=value2 and ........ for each of filter condition ? Let me know if the following code is working . It is only for two levels
data originial; input id target var1 var2 var3 var4 var5; datalines; 1 0 0 1 2 4 0 2 0 0 1 2 4 1 3 1 0 1 4 6 0 4 1 0 8 6 9 8 5 1 1 3 6 8 11 6 1 8 6 5 3 4 7 0 9 6 5 3 4 ; run; data filter; input filter_id cond1 $ value1 cond2 $ value2 cond3 $ value3; datalines; 1 var1 0 var2 1 var3 2 2 var1 0 var5 0 . . 3 var2 1 var4 4 . . 4 var4 3 var5 4 var1 6 5 var1 3 var5 4 var1 6 ; run ; %let dsid=%sysfunc(open(filter)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsid=%sysfunc(close(&dsid)); %put &nobs; data combine(drop=filter_id i j); if 0 then set filter(rename=( cond1=cond11 value1=value11 cond2=cond21 value2=value21 cond3=cond31 value3=value31)); declare hash ha1(dataset:'filter(rename=( cond1=cond11 value1=value11 cond2=cond21 value2=value21 cond3=cond31 value3=value31))'); ha1.definekey('filter_id'); ha1.definedata(all:'y'); ha1.definedone(); if 0 then set filter(rename=( cond1=cond12 value1=value12 cond2=cond22 value2=value22 cond3=cond32 value3=value32)); declare hash ha2(dataset:'filter(rename=( cond1=cond12 value1=value12 cond2=cond22 value2=value22 cond3=cond32 value3=value32))'); ha2.definekey('filter_id'); ha2.definedata(all:'y'); ha2.definedone(); do i=1 to %eval(&nobs-1) ; ha1.find(key: i); do j=i+1 to &nobs ; ha2.find(key: j); output; end; end; stop; run; data want(drop=id target var: satisfy1 satisfy2 i); if _n_ eq 1 then do; if 0 then set originial; declare hash ha(dataset:'originial'); declare hiter hi('ha'); ha.definekey('id'); ha.definedata(all:'y'); ha.definedone(); end; set combine; array con1{*} $ cond11 cond21 cond31; array val1{*} value11 value21 value31 ; array con2{*} $ cond12 cond22 cond32; array val2{*} value12 value22 value32 ; filter1=.; filter2=.; N_target0=.; N_target1=.; do while(hi.next()=0); satisfy1=1; do i=1 to dim(con1); if not missing(con1{i}) then do; if vvaluex(con1{i}) ne val1{i} then do;satisfy1=0; leave; end; end; end; if satisfy1 then filter1+1; satisfy2=1; do i=1 to dim(con2); if not missing(con2{i}) then do; if vvaluex(con2{i}) ne val2{i} then do;satisfy2=0; leave; end; end; end; if satisfy2 then filter2+1; if satisfy1 or satisfy2 then do; if target eq 0 then N_target0+1; else if target eq 1 then N_target1+1; end; end; run;
Xia Keshan
Hi,
Sorry, its really not clear to me what you want to achieve here. Do you mean that you want to loop over the filter dataset, something like:
data filter;
if ((cond1) = (value1)) and ((var2) = (value2)) and ((var3) = (value3)) in original then add 1 to result for this condition?
If so then you are going to have to look at either a macro or generated code, for example:
data original;
input id target var1 var2 var3 var4 var5;
datalines;
1 0 0 1 2 4 0
2 0 0 1 2 4 1
3 1 0 1 4 6 0
4 1 0 8 6 9 8
5 1 1 3 6 8 11
6 1 8 6 5 3 4
7 0 9 6 5 3 4
;
run;
data filter;
input filter_id cond1 $ value1 cond2 $ value2 cond3 $ value3;
datalines;
1 var1 0 var2 1 var3 2
2 var1 0 var5 0 . .
3 var2 1 var4 4 . .
4 var4 3 var5 4 var1 6
5 var1 3 var5 4 var1 6
;
run;
proc sql; create table RESULTS (COND char(2000),RESULT num); quit;
data _null_;
set filter;
call execute('proc sql;
insert into RESULTS
set COND="('||strip(cond1)||') = ('||put(value1,best.)||') and ('||strip(cond2)||') = ('||put(value2,best.)||')",
RESULT=(select count(*)
from ORIGINAL
where '||strip(cond1)||'='||put(value1,best.)||' and '||strip(cond2)||'='||put(value2,best.)||');
quit;');
run;
OK. Here is . Let me know if it worked.
data originial; input id target var1 var2 var3 var4 var5; datalines; 1 0 0 1 2 4 0 2 0 0 1 2 4 1 3 1 0 1 4 6 0 4 1 0 8 6 9 8 5 1 1 3 6 8 11 6 1 8 6 5 3 4 7 0 9 6 5 3 4 ; run; data filter; input filter_id cond1 $ value1 cond2 $ value2 cond3 $ value3; datalines; 1 var1 0 var2 1 var3 2 2 var1 0 var5 0 . . 3 var2 1 var4 4 . . 4 var4 3 var5 4 var1 6 5 var1 3 var5 4 var1 6 ; run ; data want(drop=id target var: i); if _n_ eq 1 then do; if 0 then set originial; declare hash ha(dataset:'originial'); declare hiter hi('ha'); ha.definekey('id'); ha.definedata(all:'y'); ha.definedone(); end; set filter; array con{*} cond:; array val{*} value:; N_target0=.; N_target1=.; do while(hi.next()=0); do i=1 to dim(con); if not missing(con{i}) then do; if vvaluex(con{i}) eq val{i} then do; if target eq 0 then N_target0+1; else if target eq 1 then N_target1+1; leave; end; end; end; end; run;
Xia Keshan
Thank you for your help.
I am sorry that I cannot make it clearer at the first place.
Let look at one simplest example.
I want to see how many record in original file that satisfy conditions listed in Filter_id=1 OR Filter_id=2.
of those record, how many with targert=0, how many with target =1.
So I create a 2 steps method that can help me do it.
First, create the combination of filter.
Second, loop the combination through the original file to count.
For the first step, this code below helps me to create combination of 2 filter.
For the second step, even though the idea is quite clear:
take record(i) in original file and check if it satisfy Filter1 OR filter2 listed in combination file. If yes, count target=0 OR=1.
It is on top of my head as to how to get it done!
One minor question is that: in step 1, I can only make 2 lelve of combination. How can I create say 4 level (say filter 1,2,4,5)
Please let me know if it is clear.
Many thanks.
HHC
data comb;
drop i id2;
set filter nobs=nobs;
i+1;
do j=i+1 to nobs;
set filter (rename=(filter_id=id2 cond1=c1 value1=v1 cond2=c2 value2=v2 cond3=c3 value3=v3)) point=j;
filter1st=filter_id;
filter2nd=id2;
output;
end;
run;
Hi Xia,
I think your code doesn't create the combination of filter.
I explain it better in the recent post.
Thank you for your help.
HHC
WOW. So it is going to be cond1=value1 and cond2=value2 and ........ for each of filter condition ? Let me know if the following code is working . It is only for two levels
data originial; input id target var1 var2 var3 var4 var5; datalines; 1 0 0 1 2 4 0 2 0 0 1 2 4 1 3 1 0 1 4 6 0 4 1 0 8 6 9 8 5 1 1 3 6 8 11 6 1 8 6 5 3 4 7 0 9 6 5 3 4 ; run; data filter; input filter_id cond1 $ value1 cond2 $ value2 cond3 $ value3; datalines; 1 var1 0 var2 1 var3 2 2 var1 0 var5 0 . . 3 var2 1 var4 4 . . 4 var4 3 var5 4 var1 6 5 var1 3 var5 4 var1 6 ; run ; %let dsid=%sysfunc(open(filter)); %let nobs=%sysfunc(attrn(&dsid,nlobs)); %let dsid=%sysfunc(close(&dsid)); %put &nobs; data combine(drop=filter_id i j); if 0 then set filter(rename=( cond1=cond11 value1=value11 cond2=cond21 value2=value21 cond3=cond31 value3=value31)); declare hash ha1(dataset:'filter(rename=( cond1=cond11 value1=value11 cond2=cond21 value2=value21 cond3=cond31 value3=value31))'); ha1.definekey('filter_id'); ha1.definedata(all:'y'); ha1.definedone(); if 0 then set filter(rename=( cond1=cond12 value1=value12 cond2=cond22 value2=value22 cond3=cond32 value3=value32)); declare hash ha2(dataset:'filter(rename=( cond1=cond12 value1=value12 cond2=cond22 value2=value22 cond3=cond32 value3=value32))'); ha2.definekey('filter_id'); ha2.definedata(all:'y'); ha2.definedone(); do i=1 to %eval(&nobs-1) ; ha1.find(key: i); do j=i+1 to &nobs ; ha2.find(key: j); output; end; end; stop; run; data want(drop=id target var: satisfy1 satisfy2 i); if _n_ eq 1 then do; if 0 then set originial; declare hash ha(dataset:'originial'); declare hiter hi('ha'); ha.definekey('id'); ha.definedata(all:'y'); ha.definedone(); end; set combine; array con1{*} $ cond11 cond21 cond31; array val1{*} value11 value21 value31 ; array con2{*} $ cond12 cond22 cond32; array val2{*} value12 value22 value32 ; filter1=.; filter2=.; N_target0=.; N_target1=.; do while(hi.next()=0); satisfy1=1; do i=1 to dim(con1); if not missing(con1{i}) then do; if vvaluex(con1{i}) ne val1{i} then do;satisfy1=0; leave; end; end; end; if satisfy1 then filter1+1; satisfy2=1; do i=1 to dim(con2); if not missing(con2{i}) then do; if vvaluex(con2{i}) ne val2{i} then do;satisfy2=0; leave; end; end; end; if satisfy2 then filter2+1; if satisfy1 or satisfy2 then do; if target eq 0 then N_target0+1; else if target eq 1 then N_target1+1; end; end; run;
Xia Keshan
I'm not with my PC right now.
Quite shocked to see the code.
I'll check it out asap.
Many thanks, Xia.
HHC
Thanks, Xia.
Your code is great.
As usual, it is way on top of my level.
HHC
Oh, finally, I finished the work using DO LOOP.
It is so painful.
HHC
data original;
input id target var1 var2 var3 var4 var5;
datalines;
1 0 0 1 2 4 0
2 0 0 1 2 4 1
3 1 0 1 4 6 0
4 1 0 8 6 9 8
5 1 1 3 6 8 11
6 1 8 6 5 3 4
7 0 9 6 5 3 4
8 2 9 6 5 3 4
9 4 9 6 5 3 4
;
data filter;
input filter_id cond1 $ value1 cond2 $ value2;
datalines;
1 var1 0 var2 1
2 var1 0 var5 0
3 var2 1 var4 4
5 var1 3 var5 4
6 var2 1 var4 4
7 var4 3 var5 4
8 var1 3 var5 4
;
run;
******************************************************************************
*STEP 1: CREATE 2-LEVEL COMBINATION;
******************************************************************************;
data comb2;
drop i id2 filter_id;
set filter nobs=nobs;
i+1;
do j=i+1 to nobs;
set filter (rename=(filter_id=id2 cond1=set2_name1 value1=set2_value1 cond2=set2_name2 value2=set2_value2)) point=j;
filter1st=filter_id;
filter2nd=id2;
output;
end;
run;
data comb2; set comb2;
rename cond1=set1_name1 value1=set1_value1 cond2=set1_name2 value2=set1_value2;
run;
******************************************************************************
*STEP 2: LOOP THROUGH THE ORIGINAL DATA
IF match1=2, it means first set of condition fully meet --> dont need to check SET2
If match2<2, move on to check SET2. IF fully match SET2, Match2=2
******************************************************************************;
data want3;
drop i1 j1 i2 j2 found1-found4;
set comb2;
*array for SET 1;
array set1_name{2} set1_name1 set1_name2;
array set1_value{2} set1_value1 set1_value2;
*array for SET 2;
array set2_name{2} set2_name1 set2_name2;
array set2_value{2} set2_value1 set2_value2;
*Analyze SET 1;
do p1=1 to nobs ;
match1=0;
set original point=p1 nobs=nobs;
array org_var{5} var1 var2 var3 var4 var5;
found1=0;
do i1=1 to 2 until (found1=1);
found2=0;
do j1=1 to 5 until (found2=1);
if set1_name{i1}=vname(org_var{j1}) and set1_value{i1}=org_var{j1} then do;match1=match1+1;found2=1;end;
*Once find the match, dont need to check further within the original record
but move to the next condition in the same SET (i1=2);
end;
if match1=2 then do; found1=1;match2=.;output;end; *If the first condition in the set fully satisfy, the whole filter is done-->move to the next filter_id;
end;
*if set 1 fail, go to set2;
IF match1<2 then
do;
found3=0;
match2=0;
do i2=1 to 2;
found4=0;
do j2=1 to 5 until (found4=1);
if set2_name{i2}=vname(org_var{j2}) and set2_value{i2}=org_var{j2} then do; match2=match2+1;found4=1;end;
end;
end;
*if match2=2 then output; *this only output those with match =2;
output;*this output everything, create a file of m*n;
end;
end;
run;
proc sort data=want3; by filter1st filter2nd id;run;
So what are you going to do for FOUR level ?
Hi Xia,
Thank you for checking my code.
I just did the 3 level.
HHC
**====================================================================================================;
******************************************************************************
*STEP 1: CREATE 3-LELVEL COMBINATION
******************************************************************************;
data comb3;
drop i id2;
set filter nobs=nobs;
i+1;
do j=i+1 to nobs;
set filter (rename=(filter_id=id2 cond1=set2_name1 value1=set2_value1 cond2=set2_name2 value2=set2_value2)) point=j;
do k=j+1 to nobs;
set filter (rename=(filter_id=id3 cond1=set3_name1 value1=set3_value1 cond2=set3_name2 value2=set3_value2)) point=k;
filter1st=filter_id;
filter2nd=id2;
filter3rd=id3;
output;
end;
end;
run;
data comb3; set comb3;
rename cond1=set1_name1 value1=set1_value1 cond2=set1_name2 value2=set1_value2;
run;
******************************************************************************
*STEP 2: LOOP THROUGH THE ORIGINAL DATA
IF match1=2, it means first set of condition fully meet --> dont need to check SET2
If match2<2, move on to check SET2. IF fully match SET2, Match2=2
******************************************************************************;
data want3;
drop i1 j1 i2 j2 found1-found4;
set comb3;
*array for SET 1;
array set1_name{2} set1_name1 set1_name2;
array set1_value{2} set1_value1 set1_value2;
*array for SET 2;
array set2_name{2} set2_name1 set2_name2;
array set2_value{2} set2_value1 set2_value2;
*array for SET 3;
array set3_name{2} set3_name1 set3_name2;
array set3_value{2} set3_value1 set3_value2;
*Analyze SET 1;
do p1=1 to nobs ;
match1=0;
set original point=p1 nobs=nobs;
array org_var{5} var1 var2 var3 var4 var5;
found1=0;
do i1=1 to 2 until (found1=1);
found2=0;
do j1=1 to 5 until (found2=1);
if set1_name{i1}=vname(org_var{j1}) and set1_value{i1}=org_var{j1} then do;match1=match1+1;found2=1;end;
*Once find the match, dont need to check further within the original record but move to the next condition in the same SET (i1=2);
end;
if match1=2 then do; found1=1;match2=.;match3=.;output;end; *If the first condition in the set fully satisfy, the whole filter is done-->move to the next filter_id;
end;
*if set 1 fail, go to set2;
IF match1<2 then
do;
found3=0;
match2=0;
do i2=1 to 2;
found4=0;
do j2=1 to 5 until (found4=1);
if set2_name{i2}=vname(org_var{j2}) and set2_value{i2}=org_var{j2} then do; match2=match2+1;found4=1;end;
*Once find the match, dont need to check further within the original record but move to the next condition in the same SET (i1=2);
end;
if match2=2 then do; found1=1;match3=.;output;end; *If the second condition in the set fully satisfy, the whole filter is done-->move to the next filter_id;
end;
end;
*if set 2 fail, go to set3;
IF match1<2 and match2<2 then *because I close the checking of each set right away, each checking is independent, that's why in need match1 match2<2;
do;
found5=0;
match3=0;
do i3=1 to 2;
found6=0;
do j3=1 to 5 until (found6=1);
if set3_name{i3}=vname(org_var{j3}) and set3_value{i3}=org_var{j3} then do; match3=match3+1;found6=1;end;
end;
end;
/*if match3=2 then do; found1=1; output; end;*this only output those with match =2;*/
output;*this output everything, create a file of m*n;
end;
end;
run;
proc sort data=want3; by filter1st filter2nd id;run;
**====================================================================================================;
******************************************************************************
*STEP 1: CREATE 4-LELVEL COMBINATION
******************************************************************************;
data comb4;
drop i id2 id3 id4 filter_id;
set filter nobs=nobs;
i+1;
do j=i+1 to nobs;
set filter (rename=(filter_id=id2 cond1=filter2_name1 value1=filter2_value1 cond2=filter2_name2 value2=filter2_value2)) point=j;
do k=j+1 to nobs;
set filter (rename=(filter_id=id3 cond1=filter3_name1 value1=filter3_value1 cond2=filter3_name2 value2=filter3_value2)) point=k;
do l=k+1 to nobs;
set filter (rename=(filter_id=id4 cond1=filter4_name1 value1=filter4_value1 cond2=filter4_name2 value2=filter4_value2)) point=l;
filter1st=filter_id;
filter2nd=id2;
filter3rd=id3;
filter4th=id4;
output;
end;
end;
end;
run;
data comb4; set comb4;
rename cond1=filter1_name1 value1=filter1_value1 cond2=filter1_name2 value2=filter1_value2;
run;
******************************************************************************
*STEP 2: LOOP THROUGH THE ORIGINAL DATA
IF match1=2, it means first filter of condition fully meet --> dont need to check filter2
If match2<2, move on to check filter2. IF fully match filter2, Match2=2
******************************************************************************;
data want4;
drop i1 j1 i2 j2 found1-found4;
set comb4;
*array for filter 1;
array filter1_name{2} filter1_name1 filter1_name2;
array filter1_value{2} filter1_value1 filter1_value2;
*array for filter 2;
array filter2_name{2} filter2_name1 filter2_name2;
array filter2_value{2} filter2_value1 filter2_value2;
*array for filter 3;
array filter3_name{2} filter3_name1 filter3_name2;
array filter3_value{2} filter3_value1 filter3_value2;
*array for filter 4;
array filter4_name{2} filter4_name1 filter4_name2;
array filter4_value{2} filter4_value1 filter4_value2;
*Analyze filter 1;
DO p1=1 to nobs ;
match1=0;
set original point=p1 nobs=nobs;
array org_var{5} var1 var2 var3 var4 var5;
found1=0;
do i1=1 to 2 until (found1=1);
found2=0;
do j1=1 to 5 until (found2=1);
if filter1_name{i1}=vname(org_var{j1}) and filter1_value{i1}=org_var{j1} then do;match1=match1+1;found2=1;end;
*Once find the match, dont need to check further within the original record but move to the next condition in the same filter (i1=2);
end;
if match1=2 then do; found1=1;match2=.;match3=.;output;end; *If the first condition in the filter fully satisfy, the whole filter is done-->move to the next filter_id;
end;
*if filter 1 fail, go to filter2;
IF match1<2 then
do;
found3=0;
match2=0;
do i2=1 to 2;
found4=0;
do j2=1 to 5 until (found4=1);
if filter2_name{i2}=vname(org_var{j2}) and filter2_value{i2}=org_var{j2} then do; match2=match2+1;found4=1;end;
*Once find the match, dont need to check further within the original record but move to the next condition in the same filter (i1=2);
end;
if match2=2 then do; found1=1;match3=.;output;end; *If the second condition in the filter fully satisfy, the whole filter is done-->move to the next filter_id;
end;
end;
*if filter 2 fail, go to filter3;
IF match1<2 and match2<2 then
do;/*because I close the checking of each filter right away, each checking is independent, that's why in need match1 match2<2*/
found5=0;
match3=0;
do i3=1 to 2;
found6=0;
do j3=1 to 5 until (found6=1);
if filter3_name{i3}=vname(org_var{j3}) and filter3_value{i3}=org_var{j3} then do; match3=match3+1;found6=1;end;
*Once find the match, dont need to check further within the original record but move to the next condition in the same filter (i1=2);
end;
if match3=2 then do; found1=1;match4=.;output;end; *If the second condition in the filter fully satisfy, the whole filter is done-->move to the next filter_id;
end;
end;
*if filter 3 fail, go to filter4;
IF match1<2 and match2<2 and match3<2 then
do;/*because I close the checking of each filter right away, each checking is independent, that's why in need match1 match2 match3<2*/
found7=0;
match4=0;
do i4=1 to 2;
found8=0;
do j4=1 to 5 until (found8=1);
if filter4_name{i4}=vname(org_var{j4}) and filter4_value{i4}=org_var{j4} then do; match4=match4+1;found8=1;end;
end;
end;
/*if match4=2 then do; found1=1; output; end;/*this only output those with match =2;*/
output;*this output everything, create a file of m*n;
end;
end;
run;
proc sort data=want4; by filter1st filter2nd id;run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.