Hello Everyone,
I have the following problem and would like to seek your help.
My file has variable Target (taking value 0 or 1) and different family of condition variables:
A1, A2 ... ;
B1, B2 ... Bn;
C1…Cn.
I have a code below that create a summary of number of taget=0 and target=1 for each combination of variable and value say
A1B1C1 (0, 0,0), A1B1C1 (0, 0, 1) …..
A1B1C2 (0, 0,0), A1B1C2 (0, 0, 1) …..
….
And then get the ratio of there 2 number.
This code only get 1 variable from 1 family.
Now, I am trying to modify the code so that the output file will create combination within each family, up to a chosen value of combination. Such as:
A1 B1 C1
A1A2 B1 C1 (family A has 2 level of combination)
…
A1 B1 C1
A1 B1B2 C1
A1 B1B2B3 C1 (family B has 3 level of combination)
….
A1A2 B1B2 C1
A1A2 B1BB3 C1
I should be able to dictate the maximum within family combination.
Any help is very much appreciated.
Thank you,
HHC
data have;
input target a1-a3 b1-b3 c1-c3;
datalines;
0 1 1 1 1 1 1 2 3 4
0 0 0 1 1 0 0 1 2 4
1 0 0 1 1 0 0 1 2 4
1 1 0 0 0 1 1 1 2 1
1 0 0 0 1 0 1 2 0 2
0 1 1 1 1 0 0 1 3 0
0 0 0 0 1 0 0 1 2 3
1 0 1 1 0 1 0 0 0 1
0 1 0 0 0 0 0 1 1 1
;;;;
run;
data temp(keep=_a_: _b_: _c_: target);
set have;
length _a_name _b_name _c_name $ 20;
array _a{*} a:;
array _b{*} b:;
array _c{*} c:;
do i=1 to dim(_a);
do j=1 to dim(_b);
do k=1 to dim(_b);
_a_name=vname(_a{i}); _b_name=vname(_b{j}); _c_name=vname(_c{k});
_a_value=_a{i}; _b_value=_b{j}; _c_value=_c{k};
output;
end;
end;
end;
run;
proc sql;
create table want as
select _a_name,_b_name, _c_name,_a_value,_b_value, _c_value,sum(target=1)/sum(target=0) as Ratio
from temp
group by _a_name,_b_name,_c_name,_a_value,_b_value, _c_value;
quit;
My dataset has already contains target variable. Didn't you notice it ? You can use your SQL to get that ratio .
and if you want every family at least has one member , use this code:
if count(name,'a','i') in (1,2) and count(name,'b','i') in (1,2,3) and count(name,'c','i')=1 then output;
Message was edited by: xia keshan
Look into the class, ways and type option on proc means instead.
That's a weird ratio calculation, its usually target = 1/ count(all obs in calculation).
Assuming your definition is correct, the following may help you get started.
If you post sample output it will be easier to help.
proc means data=have noprint;
class a1 a2 a3 b1 b2 b3 c1 c2 c3 target;
var target;
output out=summary n(target)=target;
run;
You can calculate the ratio in a second step.
Thank you, Reeza for your suggestion.
I did try proc means and proc summary. However, there is limitation in terms of the number of variables in class statement.
That's why I try this method.
HHC
Graycode() is an excellent choice.
data have; input target a1-a3 b1-b3 c1-c3; datalines4; 0 1 1 1 1 1 1 2 3 4 0 0 0 1 1 0 0 1 2 4 1 0 0 1 1 0 0 1 2 4 1 1 0 0 0 1 1 1 2 1 1 0 0 0 1 0 1 2 0 2 0 1 1 1 1 0 0 1 3 0 0 0 0 0 1 0 0 1 2 3 1 0 1 1 0 1 0 0 0 1 0 1 0 0 0 0 0 1 1 1 ;;;; run; data want(keep=target name value); set have; length name value $ 40; array x[9]; array y[9] a1-a3 b1-b3 c1-c3; n=dim(x); k=-1; nsubs=2**n; do i=1 to nsubs; rc=graycode(k, of x
Xia Keshan
Hi Ksharp,
You code create full combination which include a1a2a3 b1b2b3 c1c2c3. However, I would like to create maximum 2 combination within family a; and maximum 3 combination within family b.
So the "longest" name should be a1a2 b1b2b3 c1; a1a3 b1b2b3 c1; ....
Also the output file should contain the number of target=0, number of target=1 for each combination so that I can calculate the ratio.
I am still struggle with it.
Thank you for your help.
HHC
Sure. graycode() contains everything.
data have; input target a1-a3 b1-b3 c1-c3; datalines4; 0 1 1 1 1 1 1 2 3 4 0 0 0 1 1 0 0 1 2 4 1 0 0 1 1 0 0 1 2 4 1 1 0 0 0 1 1 1 2 1 1 0 0 0 1 0 1 2 0 2 0 1 1 1 1 0 0 1 3 0 0 0 0 0 1 0 0 1 2 3 1 0 1 1 0 1 0 0 0 1 0 1 0 0 0 0 0 1 1 1 ;;;; run; data want(keep=target name value); set have; length name value $ 40; array x[9]; array y[9] a1-a3 b1-b3 c1-c3; n=dim(x); k=-1; nsubs=2**n; do i=1 to nsubs; rc=graycode(k, of x
Xia Keshan
I still cannot get the count for number target=0 and number target =1?
This new code is work nicely in terms of creating combination.
Thank Ksharp.
HHC
My dataset has already contains target variable. Didn't you notice it ? You can use your SQL to get that ratio .
and if you want every family at least has one member , use this code:
if count(name,'a','i') in (1,2) and count(name,'b','i') in (1,2,3) and count(name,'c','i')=1 then output;
Message was edited by: xia keshan
Your are right, Ksharp.
The new code is very powerful as it create 1 factor only, 2 factor and then combination cross family as planned. I will study it.
However, there is advantage of the original approach (If I remember correctly, you are the one who help me creating it).
It create variables for name and variables for value. It helps me better in later analysis.
So over the night, I found a solution to it, I add 1 more loop for a family for a level.
The code below has 2 level for family a.
data temp1(keep=_a1_: _a2_: _b_: _c_: target);
set have;
length _a1_name _a2_name _b_name _c_name $ 20;
array _a1{*} a:;
array _a2{*} a:;
array _b{*} b:;
array _c{*} c:;
do i=1 to dim(_a1);
do ii=1 to dim(_a2);
do j=1 to dim(_b);
do k=1 to dim(_b);
_a1_name=vname(_a1{i});_a2_name=vname(_a2{ii}); _b_name=vname(_b{j}); _c_name=vname(_c{k});
_a1_value=_a1{i};_a2_value=_a2{ii}; _b_value=_b{j}; _c_value=_c{k};
output;
end;
end;
end;
end;
run;
proc sql;
create table want1 as
select _a1_name,_a2_name,_b_name, _c_name,_a1_value,_a2_value,_b_value, _c_value,sum(target=1)/sum(target=0) as Ratio
from temp1
group by _a1_name,_a2_name,_b_name,_c_name,_a1_value,_a2_value,_b_value, _c_value;
quit;
There is no need to use two array for the same set of variables.
And I am wondering A1A1B1C1 is also the combination you want ?
Xia Keshan
You are right, Xia.
The A1A1 shouldn't be there. I should change to do ii=i+1 to dim(_a2);
HHC
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.