🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Barite | Level 11

## Create summary for each combination of different set of variables

Hello Everyone,

I have the following problem and would like to seek your help.

My file has variable Target (taking value 0 or 1) and different family of condition variables:

A1, A2 ... ;

B1, B2 ... Bn;

C1…Cn.

I have a code below that create a summary of number of taget=0 and target=1 for each combination of variable and value say

A1B1C1 (0, 0,0),           A1B1C1 (0, 0, 1)  …..

A1B1C2 (0, 0,0),           A1B1C2 (0, 0, 1)  …..

….

And then get the ratio of there 2 number.

This code only get 1 variable from 1 family.

Now, I am trying to modify the code so that the output file will create combination within each family, up to a chosen value of combination. Such as:

A1        B1        C1

A1A2    B1        C1                    (family A has 2 level of combination)

A1        B1        C1

A1        B1B2    C1

A1        B1B2B3            C1        (family B has 3 level of combination)

….

A1A2    B1B2    C1

A1A2    B1BB3 C1

I should be able to dictate the maximum within family combination.

Any help is very much appreciated.

Thank you,

HHC

data have;

input target a1-a3 b1-b3 c1-c3;

datalines;

0 1 1 1 1 1 1 2 3 4

0 0 0 1 1 0 0 1 2 4

1 0 0 1 1 0 0 1 2 4

1 1 0 0 0 1 1 1 2 1

1 0 0 0 1 0 1 2 0 2

0 1 1 1 1 0 0 1 3 0

0 0 0 0 1 0 0 1 2 3

1 0 1 1 0 1 0 0 0 1

0 1 0 0 0 0 0 1 1 1

;;;;

run;

data temp(keep=_a_: _b_: _c_: target);

set have;

length _a_name _b_name _c_name \$ 20;

array _a{*} a:;

array _b{*} b:;

array _c{*} c:;

do i=1 to dim(_a);

do j=1 to dim(_b);

do k=1 to dim(_b);

_a_name=vname(_a{i}); _b_name=vname(_b{j}); _c_name=vname(_c{k});

_a_value=_a{i}; _b_value=_b{j}; _c_value=_c{k};

output;

end;

end;

end;

run;

proc sql;

create table want as

select _a_name,_b_name, _c_name,_a_value,_b_value, _c_value,sum(target=1)/sum(target=0) as Ratio

from temp

group by  _a_name,_b_name,_c_name,_a_value,_b_value, _c_value;

quit;

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Create summary for each combination of different set of variables

My dataset has already contains target variable. Didn't you notice it ? You can use your SQL to get that ratio .

and if you want every family at least has one member , use this code:

`if count(name,'a','i') in (1,2) and count(name,'b','i') in (1,2,3) and count(name,'c','i')=1 then output; `

Message was edited by: xia keshan

10 REPLIES 10
Super User

## Re: Create summary for each combination of different set of variables

Look into the class, ways and type option on proc means instead.

That's a weird ratio calculation, its usually target = 1/ count(all obs in calculation).

Assuming your definition is correct, the following may help you get started.

If you post sample output it will be easier to help.

proc means data=have noprint;

class a1 a2 a3 b1 b2 b3 c1 c2 c3 target;

var target;

output out=summary n(target)=target;

run;

You can calculate the ratio in a second step.

Barite | Level 11

## Re: Create summary for each combination of different set of variables

Thank you, Reeza for your suggestion.

I did try proc means and proc summary. However, there is limitation in terms of the number of variables in class statement.

That's why I try this method.

HHC

Super User

## Re: Create summary for each combination of different set of variables

Graycode() is an excellent choice.

```data have;
input target a1-a3 b1-b3 c1-c3;
datalines4;
0 1 1 1 1 1 1 2 3 4
0 0 0 1 1 0 0 1 2 4
1 0 0 1 1 0 0 1 2 4
1 1 0 0 0 1 1 1 2 1
1 0 0 0 1 0 1 2 0 2
0 1 1 1 1 0 0 1 3 0
0 0 0 0 1 0 0 1 2 3
1 0 1 1 0 1 0 0 0 1
0 1 0 0 0 0 0 1 1 1
;;;;
run;

data want(keep=target name value);
set have;
length name value \$ 40;
array x[9];
array y[9] a1-a3 b1-b3 c1-c3;
n=dim(x);
k=-1;
nsubs=2**n;
do i=1 to nsubs;
rc=graycode(k, of x);
do j=1 to n;
if x=1 then do;
name=catx(' ',name,vname(y));
value=catx(' ',value,y);
end;
end;
if find(name,'a','i') and find(name,'b','i') and find(name,'c','i') then output;
call missing(name,value);
end;
run;

```

Xia Keshan

Barite | Level 11

## Re: Create summary for each combination of different set of variables

Hi Ksharp,

You code create full combination which include a1a2a3 b1b2b3 c1c2c3. However, I would like to create maximum 2 combination within family a; and maximum 3 combination within family b.

So the "longest" name should be a1a2 b1b2b3 c1; a1a3 b1b2b3 c1; ....

Also the output file should contain the number of target=0, number of target=1 for each combination so that I can calculate the ratio.

I am still struggle with it.

Thank you for your help.

HHC

Super User

## Re: Create summary for each combination of different set of variables

Sure. graycode() contains everything.

```
data have;
input target a1-a3 b1-b3 c1-c3;
datalines4;
0 1 1 1 1 1 1 2 3 4
0 0 0 1 1 0 0 1 2 4
1 0 0 1 1 0 0 1 2 4
1 1 0 0 0 1 1 1 2 1
1 0 0 0 1 0 1 2 0 2
0 1 1 1 1 0 0 1 3 0
0 0 0 0 1 0 0 1 2 3
1 0 1 1 0 1 0 0 0 1
0 1 0 0 0 0 0 1 1 1
;;;;
run;

data want(keep=target name value);
set have;
length name value \$ 40;
array x[9];
array y[9] a1-a3 b1-b3 c1-c3;
n=dim(x);
k=-1;
nsubs=2**n;
do i=1 to nsubs;
rc=graycode(k, of x);
do j=1 to n;
if x=1 then do;
name=catx(' ',name,vname(y));
value=catx(' ',value,y);
end;
end;
if count(name,'a','i') le 2 and count(name,'b','i') le 3 and count(name,'c','i')=1 then output;
call missing(name,value);
end;
run;

```

Xia Keshan

Barite | Level 11

## Re: Create summary for each combination of different set of variables

I still cannot get the count for number target=0 and number target =1?

This new code is work nicely in terms of creating combination.

Thank Ksharp.

HHC

Super User

## Re: Create summary for each combination of different set of variables

My dataset has already contains target variable. Didn't you notice it ? You can use your SQL to get that ratio .

and if you want every family at least has one member , use this code:

`if count(name,'a','i') in (1,2) and count(name,'b','i') in (1,2,3) and count(name,'c','i')=1 then output; `

Message was edited by: xia keshan

Barite | Level 11

## Re: Create summary for each combination of different set of variables

Your are right, Ksharp.

The new code is very powerful as it create 1 factor only, 2 factor and then combination cross family as planned.  I will study it.

However, there is advantage of the original approach (If I remember correctly, you are the one who help me creating it).

It create variables for name and variables for value. It helps me better in later analysis.

So over the night, I found a solution to it, I add 1 more loop for a family for a level.

The code below has 2 level for family a.

data temp1(keep=_a1_: _a2_: _b_: _c_: target);

set have;

length _a1_name _a2_name _b_name _c_name \$ 20;

array _a1{*} a:;

array _a2{*} a:;

array _b{*} b:;

array _c{*} c:;

do i=1 to dim(_a1);

do ii=1 to dim(_a2);

do j=1 to dim(_b);

do k=1 to dim(_b);

_a1_name=vname(_a1{i});_a2_name=vname(_a2{ii}); _b_name=vname(_b{j}); _c_name=vname(_c{k});

_a1_value=_a1{i};_a2_value=_a2{ii}; _b_value=_b{j}; _c_value=_c{k};

output;

end;

end;

end;

end;

run;

proc sql;

create table want1 as

select _a1_name,_a2_name,_b_name, _c_name,_a1_value,_a2_value,_b_value, _c_value,sum(target=1)/sum(target=0) as Ratio

from temp1

group by  _a1_name,_a2_name,_b_name,_c_name,_a1_value,_a2_value,_b_value, _c_value;

quit;

Super User

## Re: Create summary for each combination of different set of variables

There is no need to use two array for the same set of variables.

And I am wondering A1A1B1C1 is also the combination you want ?

Xia Keshan

Barite | Level 11

## Re: Create summary for each combination of different set of variables

You are right, Xia.

The A1A1 shouldn't be there. I should change to do ii=i+1 to dim(_a2);

HHC

Discussion stats
• 10 replies
• 1213 views
• 6 likes
• 3 in conversation