Hello Everyone,
I have the following problem and would like to seek your help.
My file has variable Target (taking value 0 or 1) and 2 sets of condition (each take value of 0 or 1): A1, A2 ... An AND B1, B2 ... Bn.
I want to create a summary of number of taget=0 and target=1 for each combination of the 2 set of condition
say A1B1 (00, 01, 10, 11), A1B2...
I can write the simple macro to deal with them 1 by 1 as below. But if the number of condition getting large, just this macro is not helpful.
So I guess what I look for is a kind of "do for A=A1 to An" and another "do for B=B1 to Bn".
Any help is very much appreciated.
Thank you,
HHC
data have;
input target a1 a2 a3 b1 b2 b3;
datalines;
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
;run;
proc sort data=have;
by a1 b1 target ;run;
proc means noprint;
by a1 b1 target;
output out=summary_final (drop = _type_ _freq_)
n=n_a1b1; run;
data summary_final; set summary_final;
n=_n_; run; *this variable will be used to merge with the macro output below;
%macro analyze (X=, Y=);
proc sort data=have;
by &X &Y target ;run;
proc means noprint;
by &X &Y target;
output out=summary_macro (drop = _type_ _freq_)
n=n_&X&Y; run;
data summary_macro; set summary_macro;
n=_n_; run; *this variable will be used to merge with the macro output below;
proc sql;
create table summary_final as select a.*, b.n_&X&Y
from summary_final as a left join summary_macro as b
on a.n=b.n;quit;
%mend;
%analyze (X=a1,Y=b2);
%analyze (X=a1,Y=b3);
.....
data have; input target a1-a3 b1-b3; datalines; 0 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 0 0 0 ;;;; run; data temp(keep=_a_: _b_: target); set have; length _a_name _b_name $ 20; array _a{*} a:; array _b{*} b:; do i=1 to dim(_a); do j=1 to dim(_b); _a_name=vname(_a{i}); _b_name=vname(_b{j}); _a_value=_a{i}; _b_value=_b{j}; output; end; end; run; proc sql; create table want as select _a_name,_b_name,_a_value,_b_value,sum(target=1)/sum(target=0) as per from temp group by _a_name,_b_name,_a_value,_b_value; quit;
Ksharp
CLASS and TYPES.
Wow!!! thank you, Data_null_.
I am really amazed to see your code.
I still want to do a number of other step for each combination and for illustration purpose, I skip some of them (if I had known of the Class and Type, I would have include those steps).
So I wonder if you guys could have other solution so that I can have more flexibility in term of data manipulating.
Thank you.
HHC
The proc summary output is pretty flexible for selecting the combination types you are looking for with by using the _TYPE_ automatic variable.
You can also look into the classdata= option, where you can specify the specific combinations that you're looking for, rather than all.
Hi Data_null_,
If the list of condition is only A1 A2 A3... An and I want to create the summary of Target for each pair say A1A2; A1A3 ....
This code below can be used but it (a b c target) the same.
Is there any better option to do it?
Thank you,
HHC
proc summary data=have2 chartype;
class a b c: target;
ways 3;
output out=sumtest5;
run;
What output do you like ?
data have; input target a1-a3 b1-b3; datalines; 0 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 0 0 0 ;;;; run; data temp(keep=_a_: _b_: target); set have; length _a_name _b_name $ 20; array _a{*} a:; array _b{*} b:; do i=1 to dim(_a); do j=1 to dim(_b); _a_name=vname(_a{i}); _b_name=vname(_b{j}); _a_value=_a{i}; _b_value=_b{j}; output; end; end; run; proc freq data=temp noprint; table _a_name*_b_name*_a_value*_b_value*target/list out=want; run;
Ksharp
Message was edited by: xia keshan
Thank you for your different approaches.
At first, I was looking for the code by Ksharp. Now since reading Data_null_ and other, I find it more convenient to use Proc Summary.
The ultimate goal of course is to calculate the ratio of target=1/target=0 . I use the below lag code to calculate this ratio with the condition of the same type.
data sumtest; set sumtest;
drop lf lt ltype;
lf=lag(freq);
lt=lag(target);
ltype=lag(type);
if target=1 and lt=0 and type=ltype then ratio=freq/lf; run;
Because there is no guarantee that all combinations of condition has a value, ONE potential problem is that: I might take the (number of target=1) for 1 condition and divide by (number of target=0) for another condition.
it is the situation (with the below data) for a=0, c=0 AND a=0,c=1.
So I wonder if there is any solution to that.
Thank you for your support.
HHC
data
have;
input
target a b c d e f ;
datalines;
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
1 1 1 1 1 1 1
;;;;
run;
proc summary data=have chartype;
class a b c target;
ways 3;
output out=sumtest;
run;
Post the final output you need and explain your logic .
a | b | c | target | _TYPE_ | _FREQ_ | ratio |
0 | 0 | 0 | 0111 | 2 | ||
0 | 0 | 1 | 0111 | 1 |
| |
0 | 1 | 1 | 0111 | 1 | ||
1 | 1 | 0 | 0111 | 2 | ||
1 | 1 | 1 | 0111 | 2 | 1 | |
0 | 0 | 0 | 1011 | 1 | ||
0 | 1 | 1 | 1011 | 2 | 2 | |
1 | 0 | 0 | 1011 | 1 | ||
1 | 0 | 1 | 1011 | 1 | 1 | |
1 | 1 | 0 | 1011 | 2 | ||
1 | 1 | 1 | 1011 | 1 |
| |
0 | 0 | 0 | 1101 | 1 | ||
0 | 0 | 1 | 1101 | 1 | 1 | |
0 | 1 | 1 | 1101 | 1 | ||
1 | 0 | 0 | 1101 | 1 | ||
1 | 0 | 1 | 1101 | 1 | 1 | |
1 | 1 | 0 | 1101 | 2 | ||
1 | 1 | 1 | 1101 | 1 |
| |
0 | 0 | 0 | 1110 | 1 | ||
0 | 0 | 1 | 1110 | 1 | ||
0 | 1 | 1 | 1110 | 1 | ||
1 | 0 | 0 | 1110 | 2 | ||
1 | 1 | 1 | 1110 | 3 |
The output I need is the table above BUT the ratio highlighted should NOT be there since it compares the _FREQ_ of Target for 2 different combinations (a=0, c=0) and (a=0, c=1). Ideally, we should have 1 line in between those highlighted for a=0, c=0 and Target=1.
(not sure why table inflated to that big)
Thank you,
HHC
data have; input target a1-a3 b1-b3; datalines; 0 1 1 1 1 1 1 1 0 0 1 1 0 0 1 1 0 0 0 1 1 1 0 0 0 1 0 1 0 1 1 1 1 0 0 0 0 0 0 1 0 0 1 0 1 1 0 1 0 0 1 0 0 0 0 0 ;;;; run; data temp(keep=_a_: _b_: target); set have; length _a_name _b_name $ 20; array _a{*} a:; array _b{*} b:; do i=1 to dim(_a); do j=1 to dim(_b); _a_name=vname(_a{i}); _b_name=vname(_b{j}); _a_value=_a{i}; _b_value=_b{j}; output; end; end; run; proc sql; create table want as select _a_name,_b_name,_a_value,_b_value,sum(target=1)/sum(target=0) as per from temp group by _a_name,_b_name,_a_value,_b_value; quit;
Ksharp
Thank you, Ksharp.
First, I find that the temp file creates 17 duplicate observation. I am not sure what cause it.
sort data=temp out=temp1 nodupkey;
target _a_name _b_name _a_value _b_value; run;
Second, if there is only 1 list of condition (a b c d e f) instead of a1-a3 and b1-b3, and I want to do the combination of this list with the following code, there are still a lot of duplicate observation.
definitely, the problem is kind of AB and then BA.
I don't know how to eliminate this. I wonder if you could help me.
Thank you,
HHC
data temp1(keep=_a_: _b_: target);
set have;
array _a{*} a b c;
do i=1 to dim(_a);
do j=i+1 to dim(_a);
if _a{i}^=_b{j} then do;
_a_name=vname(_a{i}); _b_name=vname(_b{j});
_a_value=_a{i}; _b_value=_b{j};
output;
end;
end;
end;
run;
Sorry Ksharp.
I think I am wrong somewhere. Your code is perfect.
Might be it is too much for me today.
Thank you,
HHC
Learned it now Ksharp!
Many thanks for a new method (to me).
HHC
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.