Summary of statistics for ALL Combination from a set of variables

Reply
Super Contributor
Posts: 419

Summary of statistics for ALL Combination from a set of variables

Hi Everyone,

I have a dataset with a variable name target (taking value 0 or 1) and n independent variables.
I want to create a summary file that reports number of target=0 and number of target =1 for each combination of independent value.

data have;

  input target a1 a2 a3 a4 a5 a6;

  datalines;

0 1 9 1 0 8 1

1 1 0 1 1 5 0

1 1 9 2 3 1 1

1 2 3 0 2 0 6

0 2 9 1 7 0 0

0 3 3 0 9 0 3

1 2 1 1 2 1 2

0 1 2 0 3 0 4

;run;

Basically the summary will include All possible combination (from 1 factor to 6 factor from the 6 independent variables) such as:

If a1=1          how many observations have target=0 and how many have target=1.
If a1=1 and a2=9,       how many observations ....
If a1=1 and a2=9 and a3=1      how many observations ....
...
If a1=1 and a2=9 and a3=1 and a4=0 and a5=8 and a6=1      ....

If a1=2 how many observations ...
...
...

I don't know how to do it so any help is very much appreciated.

I am thinking about creating a condition file that summary distinct values for each independent variable and then run some kind of Do Loop to create the file I want.
The code for distinct value is as below just in case you might need.

Thank you for your help as always.

HHC


proc summary data=have missing chartype;
   class a:;
   ways 1;
   output out=distinct(drop=_type_  _freq_) / levels;
   run;
proc print;
   run;

proc sort data=distinct;
   by _level_;
   run;
data condition;
   update distinct(obs=0) distinct;
   by _level_;
   run;

Respected Advisor
Posts: 3,799

Re: Summary of statistics for ALL Combination from a set of variables

Is it this?

data have;
  input target a1-a6;
  datalines;
0 1 9 1 0 8 1
1 1 0 1 1 5 0
1 1 9 2 3 1 1
1 2 3 0 2 0 6
0 2 9 1 7 0 0
0 3 3 0 9 0 3
1 2 1 1 2 1 2
0 1 2 0 3 0 4
;;;;
run;
proc sort;
  
by target;
   run;
proc print;
  
run;
proc summary data=have chartype missing;
  
by target;
   class a:;
   output out=combo;
   run;
proc print;
  
run;
Super Contributor
Posts: 419

Re: Summary of statistics for ALL Combination from a set of variables

Posted in reply to data_null__

Yes, it is, Data_null.

My intention is like that:

Get a subset of data by using the IF section, then work on that subset, export the result and go back to IF and work on another condition.

So your code is enable me to (1) got the Frequency and (2) got the list of all combination so that I can do the iteration for deeper analysis with each subsample.

Thank you.

HHC

Super Contributor
Posts: 419

Re: Summary of statistics for ALL Combination from a set of variables

Posted in reply to data_null__

Hi Data_null,

Is there any quick change to your code if I want to change from "AND" to "OR" in my argument below?

Thank you,

HHC

If a1=1          how many observations have target=0 and how many have target=1.
If a1=1 OR a2=9,       how many observations ....
If a1=1 OR a2=9 OR a3=1      how many observations ....
...
If a1=1 OR a2=9 OR a3=1 OR a4=0 OR a5=8 OR a6=1      ....

If a1=2 how many observations ...

Ask a Question
Discussion stats
  • 3 replies
  • 342 views
  • 0 likes
  • 2 in conversation