BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hhchenfx
Barite | Level 11

Hello Everyone,

I have the following problem and would like to seek your help.

My file has variable Target (taking value 0 or 1) and 2 sets of condition (each take value of 0 or 1): A1, A2 ... An AND B1, B2 ... Bn.

I want to create a summary of number of taget=0 and target=1 for each combination of the 2 set of condition

say A1B1 (00, 01, 10, 11), A1B2...

I can write the simple macro to deal with them 1 by 1 as below. But if the number of condition getting large, just this macro is not helpful.

So I guess what I look for is a kind of "do for A=A1 to An" and another "do for B=B1 to Bn".

Any help is very much appreciated.

Thank you,

HHC

data have;

  input target a1 a2 a3 b1 b2 b3;

  datalines;

0 1 1 1 1 1 1

1 0 0 1 1 0 0

1 1 0 0 0 1 1

1 0 0 0 1 0 1

0 1 1 1 1 0 0

0 0 0 0 1 0 0

1 0 1 1 0 1 0

0 1 0 0 0 0 0

;run;

proc sort data=have;

by a1 b1 target ;run;

proc means noprint;

by a1 b1 target;

output out=summary_final (drop = _type_ _freq_)

n=n_a1b1; run;

data summary_final; set summary_final;

n=_n_; run; *this variable will be used to merge with the macro output below;

%macro analyze (X=, Y=);

proc sort data=have;

by &X &Y target ;run;

proc means noprint;

by &X &Y target;

output out=summary_macro (drop = _type_ _freq_)

n=n_&X&Y; run;

data summary_macro; set summary_macro;

n=_n_; run; *this variable will be used to merge with the macro output below;

proc sql;

create table summary_final as select a.*, b.n_&X&Y

from summary_final as a left join summary_macro as b

on a.n=b.n;quit;

%mend;

%analyze (X=a1,Y=b2);

%analyze (X=a1,Y=b3);

.....

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User
data have;
   input target a1-a3 b1-b3;
   datalines;
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
;;;;
   run;
data temp(keep=_a_: _b_: target);
 set have;
 length _a_name _b_name $ 20;
 array _a{*} a:;
 array _b{*} b:;
 do i=1 to dim(_a);
  do j=1 to dim(_b);
   _a_name=vname(_a{i}); _b_name=vname(_b{j});
   _a_value=_a{i}; _b_value=_b{j};
   output;
  end;
end;
run;
proc sql;
create table want as
 select _a_name,_b_name,_a_value,_b_value,sum(target=1)/sum(target=0) as per
  from temp
   group by  _a_name,_b_name,_a_value,_b_value;
quit;

Ksharp

View solution in original post

13 REPLIES 13
data_null__
Jade | Level 19

CLASS and TYPES.

data have;
   input target a1-a3 b1-b3;
   datalines;
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
;;;;
   run;
proc summary data=have chartype;
  
class a: b: target;
   types (a:)*(b:)*target;
   output  out=sumtest;
   run;
hhchenfx
Barite | Level 11

Wow!!! thank you, Data_null_.

I am really amazed to see your code.

I still want to do a number of other step for each combination and for illustration purpose, I skip some of them (if I had known of the Class and Type, I would have include those steps).

So I wonder if you guys could have other solution so that I can have more flexibility in term of data manipulating.

Thank you.

HHC

ballardw
Super User

The proc summary output is pretty flexible for selecting the combination types you are looking for with by using the _TYPE_ automatic variable.

Reeza
Super User

You can also look into the classdata= option, where you can specify the specific combinations that you're looking for, rather than all.

hhchenfx
Barite | Level 11

Hi Data_null_,

If the list of condition is only A1 A2 A3... An and I want to create the summary of Target for each pair say A1A2; A1A3 ....

This code below can be used but it (a b c target) the same.

Is there any better option to do it?

Thank you,

HHC

proc summary data=have2 chartype;

class a b c: target;

ways 3;

output  out=sumtest5;

run;

Ksharp
Super User

What output do you like ?

data have;
   input target a1-a3 b1-b3;
   datalines;
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
;;;;
   run;
data temp(keep=_a_: _b_: target);
 set have;
 length _a_name _b_name $ 20;
 array _a{*} a:;
 array _b{*} b:;
 do i=1 to dim(_a);
  do j=1 to dim(_b);
   _a_name=vname(_a{i}); _b_name=vname(_b{j});
   _a_value=_a{i}; _b_value=_b{j};
   output;
  end;
end;
run;

proc freq data=temp noprint;
table _a_name*_b_name*_a_value*_b_value*target/list out=want;
run;



Ksharp

Message was edited by: xia keshan

hhchenfx
Barite | Level 11

Thank you for your different approaches.

At first, I was looking for the code by Ksharp. Now since reading Data_null_ and other, I find it more convenient to use Proc Summary.

The ultimate goal of course is to calculate the ratio of target=1/target=0 .  I use the below lag code to calculate this ratio with the condition of the same type.

 

data sumtest; set sumtest;

  drop lf lt ltype;

  lf=lag(freq);

  lt=lag(target);

  ltype=lag(type);

   if target=1 and lt=0 and type=ltype then ratio=freq/lf; run;

Because there is no guarantee that all combinations of condition has a value, ONE potential problem is that: I might take the (number of target=1) for 1 condition and divide by (number of target=0) for another condition.

it is the situation (with the below data) for a=0, c=0  AND a=0,c=1.

So I wonder if there is any solution to that.

Thank you for your support.

HHC

data
have;

     input
target a b c d e f ;

     datalines;

0 1 1 1 1 1 1

  1 0 0 1 1 0 0

1 1 0 0 0 1 1

0 1 1 1 1 0 0

0 0 0 0 1 0 0

1 0 1 1 0 1 0

0 1 0 0 0 0 0

1 1 1 1 1 1 1

  ;;;;

    run;

proc summary data=have chartype;

class a b c target;

ways 3;

output  out=sumtest;

run;

Ksharp
Super User

Post the final output you need and explain your logic .

hhchenfx
Barite | Level 11

a



b



c



target



_TYPE_



_FREQ_



ratio





0


0



0



0111



2







0



0



1



0111



1



  1. 0.5



0



1



1



0111



1







1



1



0



0111



2







1



1



1



0111



2



1



0





0



0



1011



1





0





1



1



1011



2



2



1





0



0



1011



1





1





0



1



1011



1



1



1





1



0



1011



2





1





1



1



1011



1



  1. 0.5

0



0





0



1101



1





0



0





1



1101



1



1



0



1





1



1101



1





1



0




0



1101



1





1



0





1



1101



1



1



1



1





0



1101



2





1



1





1



1101



1



  1. 0.5

0



0



0





1110



1





0



0



1





1110



1





0



1



1





1110



1





1



0



0





1110



2





1



1



1




1110



3




The output I need is the table above BUT the ratio highlighted should NOT be there since it compares the _FREQ_ of Target for 2 different combinations  (a=0, c=0) and (a=0, c=1). Ideally, we should have 1 line in between those highlighted for a=0, c=0 and Target=1.

(not sure why table inflated to that big)

Thank you,

HHC

Ksharp
Super User
data have;
   input target a1-a3 b1-b3;
   datalines;
0 1 1 1 1 1 1
1 0 0 1 1 0 0
1 1 0 0 0 1 1
1 0 0 0 1 0 1
0 1 1 1 1 0 0
0 0 0 0 1 0 0
1 0 1 1 0 1 0
0 1 0 0 0 0 0
;;;;
   run;
data temp(keep=_a_: _b_: target);
 set have;
 length _a_name _b_name $ 20;
 array _a{*} a:;
 array _b{*} b:;
 do i=1 to dim(_a);
  do j=1 to dim(_b);
   _a_name=vname(_a{i}); _b_name=vname(_b{j});
   _a_value=_a{i}; _b_value=_b{j};
   output;
  end;
end;
run;
proc sql;
create table want as
 select _a_name,_b_name,_a_value,_b_value,sum(target=1)/sum(target=0) as per
  from temp
   group by  _a_name,_b_name,_a_value,_b_value;
quit;

Ksharp

hhchenfx
Barite | Level 11

Thank you, Ksharp.

First, I find that the temp file creates 17 duplicate observation. I am not sure what cause it.

sort data=temp out=temp1 nodupkey;

target _a_name _b_name _a_value _b_value; run;

Second, if there is only 1 list of condition (a b c d e f) instead of a1-a3 and b1-b3, and I want to do the combination of this list with the following code, there are still a lot of duplicate observation.

definitely, the problem is kind of AB and then BA.

I don't know how to eliminate this. I wonder if you could help me.

Thank you,

HHC

data temp1(keep=_a_: _b_: target);
set have;

array _a{*}  a b c;
  do i=1 to dim(_a);
  do j=i+1 to dim(_a);

   if _a{i}^=_b{j} then do;

   _a_name=vname(_a{i}); _b_name=vname(_b{j});
   _a_value=_a{i}; _b_value=_b{j};
   output;
  end;
end;
end;
run;

hhchenfx
Barite | Level 11

Sorry Ksharp.

I think I am wrong somewhere. Your code is perfect.

Might be it is too much for me today.

Thank you,

HHC

hhchenfx
Barite | Level 11

Learned it now Ksharp!

Many thanks for a new method (to me).

HHC

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1017 views
  • 6 likes
  • 5 in conversation