Create summary for each combination of a set of variable

Accepted Solution Solved
Reply
Super Contributor
Posts: 406
Accepted Solution

Create summary for each combination of a set of variable

Hello Everyone,

I have the following problem and would like to seek your help.

My file has variable Target (taking value 0 or 1) and 2 sets of condition (each take value of 0 or 1): A1, A2 ... An AND B1, B2 ... Bn.

I want to create a summary of number of taget=0 and target=1 for each combination of the 2 set of condition

say A1B1 (00, 01, 10, 11), A1B2...

I can write the simple macro to deal with them 1 by 1 as below. But if the number of condition getting large, just this macro is not helpful.

So I guess what I look for is a kind of "do for A=A1 to An" and another "do for B=B1 to Bn".

Any help is very much appreciated.

Thank you,

HHC

data have;

  input target a1 a2 a3 b1 b2 b3;

  datalines;

0 1 1 1 1 1 1

1 0 0 1 1 0 0

1 1 0 0 0 1 1

1 0 0 0 1 0 1

0 1 1 1 1 0 0

0 0 0 0 1 0 0

1 0 1 1 0 1 0

0 1 0 0 0 0 0

;run;

proc sort data=have;

by a1 b1 target ;run;

proc means noprint;

by a1 b1 target;

output out=summary_final (drop = _type_ _freq_)

n=n_a1b1; run;

data summary_final; set summary_final;

n=_n_; run; *this variable will be used to merge with the macro output below;

%macro analyze (X=, Y=);

proc sort data=have;

by &X &Y target ;run;

proc means noprint;

by &X &Y target;

output out=summary_macro (drop = _type_ _freq_)

n=n_&X&Y; run;

data summary_macro; set summary_macro;

n=_n_; run; *this variable will be used to merge with the macro output below;

proc sql;

create table summary_final as select a.*, b.n_&X&Y

from summary_final as a left join summary_macro as b

on a.n=b.n;quit;

%mend;

%analyze (X=a1,Y=b2);

%analyze (X=a1,Y=b3);

.....


Accepted Solutions
Solution
‎06-25-2013 11:45 PM
Super User
Posts: 10,018

Re: Create summary for each combination of a set of variable

data have;
   input target a1-a3 b1-b3;
   datalines;
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
;;;;
   run;
data temp(keep=_a_: _b_: target);
 set have;
 length _a_name _b_name $ 20;
 array _a{*} a:;
 array _b{*} b:;
 do i=1 to dim(_a);
  do j=1 to dim(_b);
   _a_name=vname(_a{i}); _b_name=vname(_b{j});
   _a_value=_a{i}; _b_value=_b{j};
   output;
  end;
end;
run;
proc sql;
create table want as
 select _a_name,_b_name,_a_value,_b_value,sum(target=1)/sum(target=0) as per
  from temp
   group by  _a_name,_b_name,_a_value,_b_value;
quit;

Ksharp

View solution in original post


All Replies
Respected Advisor
Posts: 3,799

Re: Create summary for each combination of a set of variable

CLASS and TYPES.

data have;
   input target a1-a3 b1-b3;
   datalines;
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
;;;;
   run;
proc summary data=have chartype;
  
class a: b: target;
   types (aSmiley Happy*(bSmiley Happy*target;
   output  out=sumtest;
   run;
Super Contributor
Posts: 406

Re: Create summary for each combination of a set of variable

Posted in reply to data_null__

Wow!!! thank you, Data_null_.

I am really amazed to see your code.

I still want to do a number of other step for each combination and for illustration purpose, I skip some of them (if I had known of the Class and Type, I would have include those steps).

So I wonder if you guys could have other solution so that I can have more flexibility in term of data manipulating.

Thank you.

HHC

Super User
Posts: 11,337

Re: Create summary for each combination of a set of variable

The proc summary output is pretty flexible for selecting the combination types you are looking for with by using the _TYPE_ automatic variable.

Super User
Posts: 19,768

Re: Create summary for each combination of a set of variable

You can also look into the classdata= option, where you can specify the specific combinations that you're looking for, rather than all.

Super Contributor
Posts: 406

Re: Create summary for each combination of a set of variable

Posted in reply to data_null__

Hi Data_null_,

If the list of condition is only A1 A2 A3... An and I want to create the summary of Target for each pair say A1A2; A1A3 ....

This code below can be used but it (a b c target) the same.

Is there any better option to do it?

Thank you,

HHC

proc summary data=have2 chartype;

class a b c: target;

ways 3;

output  out=sumtest5;

run;

Super User
Posts: 10,018

Re: Create summary for each combination of a set of variable

What output do you like ?

data have;
   input target a1-a3 b1-b3;
   datalines;
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
;;;;
   run;
data temp(keep=_a_: _b_: target);
 set have;
 length _a_name _b_name $ 20;
 array _a{*} a:;
 array _b{*} b:;
 do i=1 to dim(_a);
  do j=1 to dim(_b);
   _a_name=vname(_a{i}); _b_name=vname(_b{j});
   _a_value=_a{i}; _b_value=_b{j};
   output;
  end;
end;
run;

proc freq data=temp noprint;
table _a_name*_b_name*_a_value*_b_value*target/list out=want;
run;



Ksharp

Message was edited by: xia keshan

Super Contributor
Posts: 406

Re: Create summary for each combination of a set of variable

Thank you for your different approaches.

At first, I was looking for the code by Ksharp. Now since reading Data_null_ and other, I find it more convenient to use Proc Summary.

The ultimate goal of course is to calculate the ratio of target=1/target=0 .  I use the below lag code to calculate this ratio with the condition of the same type.

 

data sumtest; set sumtest;

  drop lf lt ltype;

  lf=lag(freq);

  lt=lag(target);

  ltype=lag(type);

   if target=1 and lt=0 and type=ltype then ratio=freq/lf; run;

Because there is no guarantee that all combinations of condition has a value, ONE potential problem is that: I might take the (number of target=1) for 1 condition and divide by (number of target=0) for another condition.

it is the situation (with the below data) for a=0, c=0  AND a=0,c=1.

So I wonder if there is any solution to that.

Thank you for your support.

HHC

data
have;

     input
target a b c d e f ;

     datalines;

0 1 1 1 1 1 1

  1 0 0 1 1 0 0

1 1 0 0 0 1 1

0 1 1 1 1 0 0

0 0 0 0 1 0 0

1 0 1 1 0 1 0

0 1 0 0 0 0 0

1 1 1 1 1 1 1

  ;;;;

    run;

proc summary data=have chartype;

class a b c target;

ways 3;

output  out=sumtest;

run;

Super User
Posts: 10,018

Re: Create summary for each combination of a set of variable

Post the final output you need and explain your logic .

Super Contributor
Posts: 406

Re: Create summary for each combination of a set of variable


a



b



c



target



_TYPE_



_FREQ_



ratio





0


0



0



0111



2







0



0



1



0111



1



  1. 0.5



0



1



1



0111



1







1



1



0



0111



2







1



1



1



0111



2



1



0





0



0



1011



1





0





1



1



1011



2



2



1





0



0



1011



1





1





0



1



1011



1



1



1





1



0



1011



2





1





1



1



1011



1



  1. 0.5

0



0





0



1101



1





0



0





1



1101



1



1



0



1





1



1101



1





1



0




0



1101



1





1



0





1



1101



1



1



1



1





0



1101



2





1



1





1



1101



1



  1. 0.5

0



0



0





1110



1





0



0



1





1110



1





0



1



1





1110



1





1



0



0





1110



2





1



1



1




1110



3




The output I need is the table above BUT the ratio highlighted should NOT be there since it compares the _FREQ_ of Target for 2 different combinations  (a=0, c=0) and (a=0, c=1). Ideally, we should have 1 line in between those highlighted for a=0, c=0 and Target=1.

(not sure why table inflated to that big)

Thank you,

HHC

Solution
‎06-25-2013 11:45 PM
Super User
Posts: 10,018

Re: Create summary for each combination of a set of variable

data have;
   input target a1-a3 b1-b3;
   datalines;
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
;;;;
   run;
data temp(keep=_a_: _b_: target);
 set have;
 length _a_name _b_name $ 20;
 array _a{*} a:;
 array _b{*} b:;
 do i=1 to dim(_a);
  do j=1 to dim(_b);
   _a_name=vname(_a{i}); _b_name=vname(_b{j});
   _a_value=_a{i}; _b_value=_b{j};
   output;
  end;
end;
run;
proc sql;
create table want as
 select _a_name,_b_name,_a_value,_b_value,sum(target=1)/sum(target=0) as per
  from temp
   group by  _a_name,_b_name,_a_value,_b_value;
quit;

Ksharp

Super Contributor
Posts: 406

Re: Create summary for each combination of a set of variable

Thank you, Ksharp.

First, I find that the temp file creates 17 duplicate observation. I am not sure what cause it.

sort data=temp out=temp1 nodupkey;

target _a_name _b_name _a_value _b_value; run;

Second, if there is only 1 list of condition (a b c d e f) instead of a1-a3 and b1-b3, and I want to do the combination of this list with the following code, there are still a lot of duplicate observation.

definitely, the problem is kind of AB and then BA.

I don't know how to eliminate this. I wonder if you could help me.

Thank you,

HHC

data temp1(keep=_a_: _b_: target);
set have;

array _a{*}  a b c;
  do i=1 to dim(_a);
  do j=i+1 to dim(_a);

   if _a{i}^=_b{j} then do;

   _a_name=vname(_a{i}); _b_name=vname(_b{j});
   _a_value=_a{i}; _b_value=_b{j};
   output;
  end;
end;
end;
run;

Super Contributor
Posts: 406

Re: Create summary for each combination of a set of variable

Sorry Ksharp.

I think I am wrong somewhere. Your code is perfect.

Might be it is too much for me today.

Thank you,

HHC

Super Contributor
Posts: 406

Re: Create summary for each combination of a set of variable

Learned it now Ksharp!

Many thanks for a new method (to me).

HHC

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 13 replies
  • 373 views
  • 6 likes
  • 5 in conversation