Solved: Re: Create summary for each combination of a set of variable

hhchenfx · Posted 06-25-2013 04:24 PM

Hello Everyone,

I have the following problem and would like to seek your help.

My file has variable Target (taking value 0 or 1) and 2 sets of condition (each take value of 0 or 1): A1, A2 ... An AND B1, B2 ... Bn.

I want to create a summary of number of taget=0 and target=1 for each combination of the 2 set of condition

say A1B1 (00, 01, 10, 11), A1B2...

I can write the simple macro to deal with them 1 by 1 as below. But if the number of condition getting large, just this macro is not helpful.

So I guess what I look for is a kind of "do for A=A1 to An" and another "do for B=B1 to Bn".

Any help is very much appreciated.

Thank you,

HHC

data have;

input target a1 a2 a3 b1 b2 b3;

datalines;

0 1 1 1 1 1 1

1 0 0 1 1 0 0

1 1 0 0 0 1 1

1 0 0 0 1 0 1

0 1 1 1 1 0 0

0 0 0 0 1 0 0

1 0 1 1 0 1 0

0 1 0 0 0 0 0

;run;

proc sort data=have;

by a1 b1 target ;run;

proc means noprint;

by a1 b1 target;

output out=summary_final (drop = _type_ _freq_)

n=n_a1b1; run;

data summary_final; set summary_final;

n=_n_; run; *this variable will be used to merge with the macro output below;

%macro analyze (X=, Y=);

proc sort data=have;

by &X &Y target ;run;

proc means noprint;

by &X &Y target;

output out=summary_macro (drop = _type_ _freq_)

n=n_&X&Y; run;

data summary_macro; set summary_macro;

n=_n_; run; *this variable will be used to merge with the macro output below;

proc sql;

create table summary_final as select a.*, b.n_&X&Y

from summary_final as a left join summary_macro as b

on a.n=b.n;quit;

%mend;

%analyze (X=a1,Y=b2);

%analyze (X=a1,Y=b3);

.....

Ksharp · Posted 06-25-2013 11:45 PM

data have;
   input target a1-a3 b1-b3;
   datalines;
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
;;;;
   run;
data temp(keep=_a_: _b_: target);
 set have;
 length _a_name _b_name $ 20;
 array _a{*} a:;
 array _b{*} b:;
 do i=1 to dim(_a);
  do j=1 to dim(_b);
   _a_name=vname(_a{i}); _b_name=vname(_b{j});
   _a_value=_a{i}; _b_value=_b{j};
   output;
  end;
end;
run;
proc sql;
create table want as
 select _a_name,_b_name,_a_value,_b_value,sum(target=1)/sum(target=0) as per
  from temp
   group by  _a_name,_b_name,_a_value,_b_value;
quit;

Ksharp

View solution in original post

data_null__ · Posted 06-25-2013 04:52 PM

CLASS and TYPES.

data have;
   input target a1-a3 b1-b3;
   datalines; 
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
;;;;
   run; 
proc summary data=have chartype; 
   class a: b: target;
   types (a:)*(b:)*target;
   output  out=sumtest;
   run;

hhchenfx · Posted 06-25-2013 05:16 PM

Wow!!! thank you, Data_null_.

I am really amazed to see your code.

I still want to do a number of other step for each combination and for illustration purpose, I skip some of them (if I had known of the Class and Type, I would have include those steps).

So I wonder if you guys could have other solution so that I can have more flexibility in term of data manipulating.

Thank you.

HHC

ballardw · Posted 06-25-2013 05:37 PM

The proc summary output is pretty flexible for selecting the combination types you are looking for with by using the _TYPE_ automatic variable.

Reeza · Posted 06-25-2013 06:17 PM

You can also look into the classdata= option, where you can specify the specific combinations that you're looking for, rather than all.

hhchenfx · Posted 06-25-2013 09:30 PM

Hi Data_null_,

If the list of condition is only A1 A2 A3... An and I want to create the summary of Target for each pair say A1A2; A1A3 ....

This code below can be used but it (a b c target) the same.

Is there any better option to do it?

Thank you,

HHC

proc summary data=have2 chartype;

class a b c: target;

ways 3;

output out=sumtest5;

run;

Ksharp · Posted 06-25-2013 10:05 PM

What output do you like ?

data have;
   input target a1-a3 b1-b3;
   datalines;
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
;;;;
   run;
data temp(keep=_a_: _b_: target);
 set have;
 length _a_name _b_name $ 20;
 array _a{*} a:;
 array _b{*} b:;
 do i=1 to dim(_a);
  do j=1 to dim(_b);
   _a_name=vname(_a{i}); _b_name=vname(_b{j});
   _a_value=_a{i}; _b_value=_b{j};
   output;
  end;
end;
run;

proc freq data=temp noprint;
table _a_name*_b_name*_a_value*_b_value*target/list out=want;
run;

Ksharp

Message was edited by: xia keshan

hhchenfx · Posted 06-25-2013 10:49 PM

Thank you for your different approaches.

At first, I was looking for the code by Ksharp. Now since reading Data_null_ and other, I find it more convenient to use Proc Summary.

The ultimate goal of course is to calculate the ratio of target=1/target=0 . I use the below lag code to calculate this ratio with the condition of the same type.

data sumtest; set sumtest;

drop lf lt ltype;

lf=lag(freq);

lt=lag(target);

ltype=lag(type);

if target=1 and lt=0 and type=ltype then ratio=freq/lf; run;

Because there is no guarantee that all combinations of condition has a value, ONE potential problem is that: I might take the (number of target=1) for 1 condition and divide by (number of target=0) for another condition.

it is the situation (with the below data) for a=0, c=0 AND a=0,c=1.

So I wonder if there is any solution to that.

Thank you for your support.

HHC

data
have;

input
target a b c d e f ;

datalines;

0 1 1 1 1 1 1

1 0 0 1 1 0 0

1 1 0 0 0 1 1

0 1 1 1 1 0 0

0 0 0 0 1 0 0

1 0 1 1 0 1 0

0 1 0 0 0 0 0

1 1 1 1 1 1 1

;;;;

run;

proc summary data=have chartype;

class a b c target;

ways 3;

output out=sumtest;

run;

Ksharp · Posted 06-25-2013 11:01 PM

Post the final output you need and explain your logic .

hhchenfx · Posted 06-25-2013 11:24 PM

a	b	c	target	_TYPE_	_FREQ_	ratio
	0	0	0	0111	2
	0	0	1	0111	1	0.5
	0	1	1	0111	1
	1	1	0	0111	2
	1	1	1	0111	2	1
0		0	0	1011	1
0		1	1	1011	2	2
1		0	0	1011	1
1		0	1	1011	1	1
1		1	0	1011	2
1		1	1	1011	1	0.5
0	0		0	1101	1
0	0		1	1101	1	1
0	1		1	1101	1
1	0		0	1101	1
1	0		1	1101	1	1
1	1		0	1101	2
1	1		1	1101	1	0.5
0	0	0		1110	1
0	0	1		1110	1
0	1	1		1110	1
1	0	0		1110	2
1	1	1		1110	3

The output I need is the table above BUT the ratio highlighted should NOT be there since it compares the _FREQ_ of Target for 2 different combinations (a=0, c=0) and (a=0, c=1). Ideally, we should have 1 line in between those highlighted for a=0, c=0 and Target=1.

(not sure why table inflated to that big)

Thank you,

HHC

Ksharp · Posted 06-25-2013 11:45 PM

data have;
   input target a1-a3 b1-b3;
   datalines;
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
;;;;
   run;
data temp(keep=_a_: _b_: target);
 set have;
 length _a_name _b_name $ 20;
 array _a{*} a:;
 array _b{*} b:;
 do i=1 to dim(_a);
  do j=1 to dim(_b);
   _a_name=vname(_a{i}); _b_name=vname(_b{j});
   _a_value=_a{i}; _b_value=_b{j};
   output;
  end;
end;
run;
proc sql;
create table want as
 select _a_name,_b_name,_a_value,_b_value,sum(target=1)/sum(target=0) as per
  from temp
   group by  _a_name,_b_name,_a_value,_b_value;
quit;

Ksharp

hhchenfx · Posted 06-26-2013 12:28 AM

Thank you, Ksharp.

First, I find that the temp file creates 17 duplicate observation. I am not sure what cause it.

sort data=temp out=temp1 nodupkey;

target _a_name _b_name _a_value _b_value; run;

Second, if there is only 1 list of condition (a b c d e f) instead of a1-a3 and b1-b3, and I want to do the combination of this list with the following code, there are still a lot of duplicate observation.

definitely, the problem is kind of AB and then BA.

I don't know how to eliminate this. I wonder if you could help me.

Thank you,

HHC

data temp1(keep=_a_: _b_: target);
set have;

array _a{*} a b c;
do i=1 to dim(_a);
do j=i+1 to dim(_a);

if _a{i}^=_b{j} then do;

   _a_name=vname(_a{i}); _b_name=vname(_b{j});
   _a_value=_a{i}; _b_value=_b{j};
   output;
end;
end;
end;
run;

hhchenfx · Posted 06-26-2013 12:37 AM

Sorry Ksharp.

I think I am wrong somewhere. Your code is perfect.

Might be it is too much for me today.

Thank you,

HHC

hhchenfx · Posted 06-26-2013 01:29 PM

Learned it now Ksharp!

Many thanks for a new method (to me).

HHC

Register Today!

SAS Training: Just a Click Away