Create summary for each combination of different set of variables

Accepted Solution Solved
Reply
Super Contributor
Posts: 371
Accepted Solution

Create summary for each combination of different set of variables

Hello Everyone,

I have the following problem and would like to seek your help.

My file has variable Target (taking value 0 or 1) and different family of condition variables:

A1, A2 ... ;

B1, B2 ... Bn;

C1…Cn.

I have a code below that create a summary of number of taget=0 and target=1 for each combination of variable and value say

A1B1C1 (0, 0,0),           A1B1C1 (0, 0, 1)  …..

A1B1C2 (0, 0,0),           A1B1C2 (0, 0, 1)  …..

….

And then get the ratio of there 2 number.

This code only get 1 variable from 1 family.

Now, I am trying to modify the code so that the output file will create combination within each family, up to a chosen value of combination. Such as:

A1        B1        C1

A1A2    B1        C1                    (family A has 2 level of combination)

A1        B1        C1

A1        B1B2    C1

A1        B1B2B3            C1        (family B has 3 level of combination)

….

A1A2    B1B2    C1

A1A2    B1BB3 C1                   

I should be able to dictate the maximum within family combination.

Any help is very much appreciated.

Thank you,

HHC

data have;                                                                                                                             

   input target a1-a3 b1-b3 c1-c3;                                                                                                     

   datalines;                                                                                                                          

0 1 1 1 1 1 1 2 3 4                                                                                                                    

0 0 0 1 1 0 0 1 2 4                                                                                                                    

1 0 0 1 1 0 0 1 2 4                                                                                                                    

1 1 0 0 0 1 1 1 2 1                                                                                                                    

1 0 0 0 1 0 1 2 0 2                                                                                                                    

0 1 1 1 1 0 0 1 3 0                                                                                                                    

0 0 0 0 1 0 0 1 2 3                                                                                                                    

1 0 1 1 0 1 0 0 0 1                                                                                                                    

0 1 0 0 0 0 0 1 1 1                                                                                                                    

;;;;                                                                                                                                   

   run;                                                                                                                                

                                                                                                                                       

data temp(keep=_a_: _b_: _c_: target);                                                                                                 

set have;                                                                                                                             

length _a_name _b_name _c_name $ 20;                                                                                                  

array _a{*} a:;                                                                                                                       

array _b{*} b:;                                                                                                                       

array _c{*} c:;                                                                                                                       

do i=1 to dim(_a);                                                                                                                    

  do j=1 to dim(_b);                                                                                                                   

   do k=1 to dim(_b);                                                                                                                  

   _a_name=vname(_a{i}); _b_name=vname(_b{j}); _c_name=vname(_c{k});                                                                   

   _a_value=_a{i}; _b_value=_b{j}; _c_value=_c{k};                                                                                     

                                                                                                                                       

   output;                                                                                                                             

  end;                                                                                                                                 

end;                                                                                                                                   

end;                                                                                                                                   

run;                                                                                                                                   

                                                                                                                                       

proc sql;                                                                                                                              

create table want as                                                                                                                   

select _a_name,_b_name, _c_name,_a_value,_b_value, _c_value,sum(target=1)/sum(target=0) as Ratio                                      

  from temp                                                                                                                            

   group by  _a_name,_b_name,_c_name,_a_value,_b_value, _c_value;                                                                      

quit;                                       


Accepted Solutions
Solution
‎05-04-2014 10:33 AM
Super User
Posts: 9,682

Re: Create summary for each combination of different set of variables

My dataset has already contains target variable. Didn't you notice it ? You can use your SQL to get that ratio .

and if you want every family at least has one member , use this code:

if count(name,'a','i') in (1,2) and count(name,'b','i') in (1,2,3) and count(name,'c','i')=1 then output; 

Message was edited by: xia keshan

View solution in original post


All Replies
Super User
Posts: 17,864

Re: Create summary for each combination of different set of variables

Look into the class, ways and type option on proc means instead.

That's a weird ratio calculation, its usually target = 1/ count(all obs in calculation).

Assuming your definition is correct, the following may help you get started.

If you post sample output it will be easier to help.

proc means data=have noprint;

class a1 a2 a3 b1 b2 b3 c1 c2 c3 target;

var target;

output out=summary n(target)=target;

run;

You can calculate the ratio in a second step.

Super Contributor
Posts: 371

Re: Create summary for each combination of different set of variables

Thank you, Reeza for your suggestion.

I did try proc means and proc summary. However, there is limitation in terms of the number of variables in class statement.

That's why I try this method.

HHC

Super User
Posts: 9,682

Re: Create summary for each combination of different set of variables

Graycode() is an excellent choice.

data have;                                                                                                                             
   input target a1-a3 b1-b3 c1-c3;                                                                                                     
   datalines4;                                                                                                                          
0 1 1 1 1 1 1 2 3 4                                                                                                                    
0 0 0 1 1 0 0 1 2 4                                                                                                                    
1 0 0 1 1 0 0 1 2 4                                                                                                                    
1 1 0 0 0 1 1 1 2 1                                                                                                                    
1 0 0 0 1 0 1 2 0 2                                                                                                                    
0 1 1 1 1 0 0 1 3 0                                                                                                                    
0 0 0 0 1 0 0 1 2 3                                                                                                                    
1 0 1 1 0 1 0 0 0 1                                                                                                                    
0 1 0 0 0 0 0 1 1 1                                                                                                                    
;;;;                                                                                                                                   
   run;                                                                                                                                
 


data want(keep=target name value);
set have;
length name value $ 40;
array x[9];
array y[9] a1-a3 b1-b3 c1-c3;
n=dim(x);
k=-1;
nsubs=2**n;
do i=1 to nsubs;
rc=graycode(k, of x
  • ); do j=1 to n;   if x=1 then do;      name=catx(' ',name,vname(y));       value=catx(' ',value,y);    end; end; if find(name,'a','i') and find(name,'b','i') and find(name,'c','i') then output; call missing(name,value); end; run;
  • Xia Keshan

    Super Contributor
    Posts: 371

    Re: Create summary for each combination of different set of variables

    Hi Ksharp,

    You code create full combination which include a1a2a3 b1b2b3 c1c2c3. However, I would like to create maximum 2 combination within family a; and maximum 3 combination within family b.

    So the "longest" name should be a1a2 b1b2b3 c1; a1a3 b1b2b3 c1; ....

    Also the output file should contain the number of target=0, number of target=1 for each combination so that I can calculate the ratio.

    I am still struggle with it.

    Thank you for your help.

    HHC

    Super User
    Posts: 9,682

    Re: Create summary for each combination of different set of variables

    Sure. graycode() contains everything.

     
    data have;                                                                                                                             
       input target a1-a3 b1-b3 c1-c3;                                                                                                     
       datalines4;                                                                                                                          
    0 1 1 1 1 1 1 2 3 4                                                                                                                    
    0 0 0 1 1 0 0 1 2 4                                                                                                                    
    1 0 0 1 1 0 0 1 2 4                                                                                                                    
    1 1 0 0 0 1 1 1 2 1                                                                                                                    
    1 0 0 0 1 0 1 2 0 2                                                                                                                    
    0 1 1 1 1 0 0 1 3 0                                                                                                                    
    0 0 0 0 1 0 0 1 2 3                                                                                                                    
    1 0 1 1 0 1 0 0 0 1                                                                                                                    
    0 1 0 0 0 0 0 1 1 1                                                                                                                    
    ;;;;                                                                                                                                   
       run;                                                                                                                                
     
    
    
    data want(keep=target name value);
    set have;
    length name value $ 40;
    array x[9];
    array y[9] a1-a3 b1-b3 c1-c3;
    n=dim(x);
    k=-1;
    nsubs=2**n;
    do i=1 to nsubs;
    rc=graycode(k, of x
  • ); do j=1 to n;   if x=1 then do;      name=catx(' ',name,vname(y));       value=catx(' ',value,y);    end; end; if count(name,'a','i') le 2 and count(name,'b','i') le 3 and count(name,'c','i')=1 then output; call missing(name,value); end; run;
  • Xia Keshan

    Super Contributor
    Posts: 371

    Re: Create summary for each combination of different set of variables

    I still cannot get the count for number target=0 and number target =1?

    This new code is work nicely in terms of creating combination.

    Thank Ksharp.

    HHC

    Solution
    ‎05-04-2014 10:33 AM
    Super User
    Posts: 9,682

    Re: Create summary for each combination of different set of variables

    My dataset has already contains target variable. Didn't you notice it ? You can use your SQL to get that ratio .

    and if you want every family at least has one member , use this code:

    if count(name,'a','i') in (1,2) and count(name,'b','i') in (1,2,3) and count(name,'c','i')=1 then output; 

    Message was edited by: xia keshan

    Super Contributor
    Posts: 371

    Re: Create summary for each combination of different set of variables

    Your are right, Ksharp.

    The new code is very powerful as it create 1 factor only, 2 factor and then combination cross family as planned.  I will study it.

    However, there is advantage of the original approach (If I remember correctly, you are the one who help me creating it).

    It create variables for name and variables for value. It helps me better in later analysis.

    So over the night, I found a solution to it, I add 1 more loop for a family for a level.

    The code below has 2 level for family a.

    data temp1(keep=_a1_: _a2_: _b_: _c_: target);                                                                                                

    set have;                                                                                                                            

    length _a1_name _a2_name _b_name _c_name $ 20;     

    array _a1{*} a:;

    array _a2{*} a:;

    array _b{*} b:;                                                                                                                      

    array _c{*} c:;

    do i=1 to dim(_a1); 

    do ii=1 to dim(_a2); 

      do j=1 to dim(_b);                                                                                                                  

       do k=1 to dim(_b);                                                                                                                 

       _a1_name=vname(_a1{i});_a2_name=vname(_a2{ii}); _b_name=vname(_b{j}); _c_name=vname(_c{k});                                                                  

       _a1_value=_a1{i};_a2_value=_a2{ii}; _b_value=_b{j}; _c_value=_c{k};                                                                                    

                                                                                                                                          

       output;                                                                                                                            

      end;                                                                                                                                

    end;                                                                                                                                   

    end;

    end;

    run;                                                                                                                                 

                                                                                                                                          

    proc sql;                                                                                                                             

    create table want1 as                                                                                                                  

    select _a1_name,_a2_name,_b_name, _c_name,_a1_value,_a2_value,_b_value, _c_value,sum(target=1)/sum(target=0) as Ratio                                     

      from temp1                                                                                                                           

       group by  _a1_name,_a2_name,_b_name,_c_name,_a1_value,_a2_value,_b_value, _c_value;                                                                     

    quit;   

    Super User
    Posts: 9,682

    Re: Create summary for each combination of different set of variables

    There is no need to use two array for the same set of variables.

    And I am wondering A1A1B1C1 is also the combination you want ?

    Xia Keshan

    Super Contributor
    Posts: 371

    Re: Create summary for each combination of different set of variables

    You are right, Xia.

    The A1A1 shouldn't be there. I should change to do ii=i+1 to dim(_a2);

    HHC

    ☑ This topic is SOLVED.

    Need further help from the community? Please ask a new question.

    Discussion stats
    • 10 replies
    • 382 views
    • 6 likes
    • 3 in conversation