BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hhchenfx
Barite | Level 11

Hello Everyone,

I have the following problem and would like to seek your help.

My file has variable Target (taking value 0 or 1) and different family of condition variables:

A1, A2 ... ;

B1, B2 ... Bn;

C1…Cn.

I have a code below that create a summary of number of taget=0 and target=1 for each combination of variable and value say

A1B1C1 (0, 0,0),           A1B1C1 (0, 0, 1)  …..

A1B1C2 (0, 0,0),           A1B1C2 (0, 0, 1)  …..

….

And then get the ratio of there 2 number.

This code only get 1 variable from 1 family.

Now, I am trying to modify the code so that the output file will create combination within each family, up to a chosen value of combination. Such as:

A1        B1        C1

A1A2    B1        C1                    (family A has 2 level of combination)

A1        B1        C1

A1        B1B2    C1

A1        B1B2B3            C1        (family B has 3 level of combination)

….

A1A2    B1B2    C1

A1A2    B1BB3 C1                   

I should be able to dictate the maximum within family combination.

Any help is very much appreciated.

Thank you,

HHC

data have;                                                                                                                             

   input target a1-a3 b1-b3 c1-c3;                                                                                                     

   datalines;                                                                                                                          

0 1 1 1 1 1 1 2 3 4                                                                                                                    

0 0 0 1 1 0 0 1 2 4                                                                                                                    

1 0 0 1 1 0 0 1 2 4                                                                                                                    

1 1 0 0 0 1 1 1 2 1                                                                                                                    

1 0 0 0 1 0 1 2 0 2                                                                                                                    

0 1 1 1 1 0 0 1 3 0                                                                                                                    

0 0 0 0 1 0 0 1 2 3                                                                                                                    

1 0 1 1 0 1 0 0 0 1                                                                                                                    

0 1 0 0 0 0 0 1 1 1                                                                                                                    

;;;;                                                                                                                                   

   run;                                                                                                                                

                                                                                                                                       

data temp(keep=_a_: _b_: _c_: target);                                                                                                 

set have;                                                                                                                             

length _a_name _b_name _c_name $ 20;                                                                                                  

array _a{*} a:;                                                                                                                       

array _b{*} b:;                                                                                                                       

array _c{*} c:;                                                                                                                       

do i=1 to dim(_a);                                                                                                                    

  do j=1 to dim(_b);                                                                                                                   

   do k=1 to dim(_b);                                                                                                                  

   _a_name=vname(_a{i}); _b_name=vname(_b{j}); _c_name=vname(_c{k});                                                                   

   _a_value=_a{i}; _b_value=_b{j}; _c_value=_c{k};                                                                                     

                                                                                                                                       

   output;                                                                                                                             

  end;                                                                                                                                 

end;                                                                                                                                   

end;                                                                                                                                   

run;                                                                                                                                   

                                                                                                                                       

proc sql;                                                                                                                              

create table want as                                                                                                                   

select _a_name,_b_name, _c_name,_a_value,_b_value, _c_value,sum(target=1)/sum(target=0) as Ratio                                      

  from temp                                                                                                                            

   group by  _a_name,_b_name,_c_name,_a_value,_b_value, _c_value;                                                                      

quit;                                       

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

My dataset has already contains target variable. Didn't you notice it ? You can use your SQL to get that ratio .

and if you want every family at least has one member , use this code:

if count(name,'a','i') in (1,2) and count(name,'b','i') in (1,2,3) and count(name,'c','i')=1 then output; 

Message was edited by: xia keshan

View solution in original post

10 REPLIES 10
Reeza
Super User

Look into the class, ways and type option on proc means instead.

That's a weird ratio calculation, its usually target = 1/ count(all obs in calculation).

Assuming your definition is correct, the following may help you get started.

If you post sample output it will be easier to help.

proc means data=have noprint;

class a1 a2 a3 b1 b2 b3 c1 c2 c3 target;

var target;

output out=summary n(target)=target;

run;

You can calculate the ratio in a second step.

hhchenfx
Barite | Level 11

Thank you, Reeza for your suggestion.

I did try proc means and proc summary. However, there is limitation in terms of the number of variables in class statement.

That's why I try this method.

HHC

Ksharp
Super User

Graycode() is an excellent choice.

data have;                                                                                                                             
   input target a1-a3 b1-b3 c1-c3;                                                                                                     
   datalines4;                                                                                                                          
0 1 1 1 1 1 1 2 3 4                                                                                                                    
0 0 0 1 1 0 0 1 2 4                                                                                                                    
1 0 0 1 1 0 0 1 2 4                                                                                                                    
1 1 0 0 0 1 1 1 2 1                                                                                                                    
1 0 0 0 1 0 1 2 0 2                                                                                                                    
0 1 1 1 1 0 0 1 3 0                                                                                                                    
0 0 0 0 1 0 0 1 2 3                                                                                                                    
1 0 1 1 0 1 0 0 0 1                                                                                                                    
0 1 0 0 0 0 0 1 1 1                                                                                                                    
;;;;                                                                                                                                   
   run;                                                                                                                                
 


data want(keep=target name value);
set have;
length name value $ 40;
array x[9];
array y[9] a1-a3 b1-b3 c1-c3;
n=dim(x);
k=-1;
nsubs=2**n;
do i=1 to nsubs;
rc=graycode(k, of x
  • ); do j=1 to n;   if x=1 then do;      name=catx(' ',name,vname(y));       value=catx(' ',value,y);    end; end; if find(name,'a','i') and find(name,'b','i') and find(name,'c','i') then output; call missing(name,value); end; run;
  • Xia Keshan

    hhchenfx
    Barite | Level 11

    Hi Ksharp,

    You code create full combination which include a1a2a3 b1b2b3 c1c2c3. However, I would like to create maximum 2 combination within family a; and maximum 3 combination within family b.

    So the "longest" name should be a1a2 b1b2b3 c1; a1a3 b1b2b3 c1; ....

    Also the output file should contain the number of target=0, number of target=1 for each combination so that I can calculate the ratio.

    I am still struggle with it.

    Thank you for your help.

    HHC

    Ksharp
    Super User

    Sure. graycode() contains everything.

     
    data have;                                                                                                                             
       input target a1-a3 b1-b3 c1-c3;                                                                                                     
       datalines4;                                                                                                                          
    0 1 1 1 1 1 1 2 3 4                                                                                                                    
    0 0 0 1 1 0 0 1 2 4                                                                                                                    
    1 0 0 1 1 0 0 1 2 4                                                                                                                    
    1 1 0 0 0 1 1 1 2 1                                                                                                                    
    1 0 0 0 1 0 1 2 0 2                                                                                                                    
    0 1 1 1 1 0 0 1 3 0                                                                                                                    
    0 0 0 0 1 0 0 1 2 3                                                                                                                    
    1 0 1 1 0 1 0 0 0 1                                                                                                                    
    0 1 0 0 0 0 0 1 1 1                                                                                                                    
    ;;;;                                                                                                                                   
       run;                                                                                                                                
     
    
    
    data want(keep=target name value);
    set have;
    length name value $ 40;
    array x[9];
    array y[9] a1-a3 b1-b3 c1-c3;
    n=dim(x);
    k=-1;
    nsubs=2**n;
    do i=1 to nsubs;
    rc=graycode(k, of x
  • ); do j=1 to n;   if x=1 then do;      name=catx(' ',name,vname(y));       value=catx(' ',value,y);    end; end; if count(name,'a','i') le 2 and count(name,'b','i') le 3 and count(name,'c','i')=1 then output; call missing(name,value); end; run;
  • Xia Keshan

    hhchenfx
    Barite | Level 11

    I still cannot get the count for number target=0 and number target =1?

    This new code is work nicely in terms of creating combination.

    Thank Ksharp.

    HHC

    Ksharp
    Super User

    My dataset has already contains target variable. Didn't you notice it ? You can use your SQL to get that ratio .

    and if you want every family at least has one member , use this code:

    if count(name,'a','i') in (1,2) and count(name,'b','i') in (1,2,3) and count(name,'c','i')=1 then output; 

    Message was edited by: xia keshan

    hhchenfx
    Barite | Level 11

    Your are right, Ksharp.

    The new code is very powerful as it create 1 factor only, 2 factor and then combination cross family as planned.  I will study it.

    However, there is advantage of the original approach (If I remember correctly, you are the one who help me creating it).

    It create variables for name and variables for value. It helps me better in later analysis.

    So over the night, I found a solution to it, I add 1 more loop for a family for a level.

    The code below has 2 level for family a.

    data temp1(keep=_a1_: _a2_: _b_: _c_: target);                                                                                                

    set have;                                                                                                                            

    length _a1_name _a2_name _b_name _c_name $ 20;     

    array _a1{*} a:;

    array _a2{*} a:;

    array _b{*} b:;                                                                                                                      

    array _c{*} c:;

    do i=1 to dim(_a1); 

    do ii=1 to dim(_a2); 

      do j=1 to dim(_b);                                                                                                                  

       do k=1 to dim(_b);                                                                                                                 

       _a1_name=vname(_a1{i});_a2_name=vname(_a2{ii}); _b_name=vname(_b{j}); _c_name=vname(_c{k});                                                                  

       _a1_value=_a1{i};_a2_value=_a2{ii}; _b_value=_b{j}; _c_value=_c{k};                                                                                    

                                                                                                                                          

       output;                                                                                                                            

      end;                                                                                                                                

    end;                                                                                                                                   

    end;

    end;

    run;                                                                                                                                 

                                                                                                                                          

    proc sql;                                                                                                                             

    create table want1 as                                                                                                                  

    select _a1_name,_a2_name,_b_name, _c_name,_a1_value,_a2_value,_b_value, _c_value,sum(target=1)/sum(target=0) as Ratio                                     

      from temp1                                                                                                                           

       group by  _a1_name,_a2_name,_b_name,_c_name,_a1_value,_a2_value,_b_value, _c_value;                                                                     

    quit;   

    Ksharp
    Super User

    There is no need to use two array for the same set of variables.

    And I am wondering A1A1B1C1 is also the combination you want ?

    Xia Keshan

    hhchenfx
    Barite | Level 11

    You are right, Xia.

    The A1A1 shouldn't be there. I should change to do ii=i+1 to dim(_a2);

    HHC

    sas-innovate-2024.png

    Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

    Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

     

    Register now!

    How to Concatenate Values

    Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

    Find more tutorials on the SAS Users YouTube channel.

    Click image to register for webinarClick image to register for webinar

    Classroom Training Available!

    Select SAS Training centers are offering in-person courses. View upcoming courses for:

    View all other training opportunities.

    Discussion stats
    • 10 replies
    • 937 views
    • 6 likes
    • 3 in conversation