Help me with summary on combination of variables' value

Accepted Solution Solved
Reply
Super Contributor
Posts: 412
Accepted Solution

Help me with summary on combination of variables' value

Hi Everyone,

I have a dataset that have a Target Variable and a number of independent variables, say a1 a2 … a6 as below.

I want to create a summary file of the target value for each combination of independent variable and their value.

Basically the summary will answer the question:

If a1=5 and a2=9,       how many observations have target=0 and how many have target=1.

If a1=4 and a2=1,       how many observations have target=0 and how many have target=1.

If a1=5 and a3=1,       how many observations have target=0 and how many have target=1.

…..

I really appreciate it if you could help me with this problem.

Thank you,

HHC

data have;

  input target a1 a2 a3 a4 a5 a6;

  datalines;

0 5 9 1 0 8 1

1 4 0 1 1 5 0

1 8 1 2 3 1 1

1 3 3 0 2 0 6

0 4 1 1 7 0 0

0 3 3 0 9 0 3

1 2 1 1 2 1 2

0 1 2 0 3 0 4

;run;


Accepted Solutions
Solution
‎08-13-2013 10:30 AM
Super User
Posts: 5,504

Re: Help me with summary on combination of variables' value

It sounds like you need the PROC SUMMARY approach, with this additional statement:

ways 2;

I guess if you add WAYS 2, then NWAYS should be removed.

That should do it.

View solution in original post


All Replies
Super User
Posts: 5,429

Re: Help me with summary on combination of variables' value

proc summary or SQL group by...?

proc sql;

select a1, a2, target, count(*) as nobs

from have

group by a1, a2, target;

quit;

Data never sleeps
Super Contributor
Posts: 412

Re: Help me with summary on combination of variables' value

Thank you for your answer.

My problem is that I want SAS to create the summary across ALL combination of variable and ALL value.

I should have clarify it in the original post.

HHC

Super User
Posts: 5,429

Re: Help me with summary on combination of variables' value

So what combinations do wish to calculate? I can't see the pattern in your example...?

Data never sleeps
Super Contributor
Posts: 412

Re: Help me with summary on combination of variables' value

Actually, there are 2 levels of combinations: :smileyinfo: combination of variables and (ii) combination of value of variable.

For the first level, I want all combination of a1-a6, which are:

a1a2, a1a3,a1a4,a1a5,a1a6;

a2a3,a2a4,a2a5,a2a5;

a3a4,a3a5,a3a6;

a4a5,a4a6;

and a5a6.

For each of the above variable-combination I want value combination. Take the first pair a1a2 as a example:

a1 has value 1 2 3 4 5 8

a2 has value 1 2 3 9

I want to report how many observation with target=1 and target=0 for each value-combination as below:

a1=1 and a2=1: 0 observation with target=0 ; 0 observation with target=1 (In fact, for simplicity this report can be ignored since there is no such combination in the data)

a1=1 and a2=2:  1 observation with target=0; 0 observation with target=1

….

a1=8 and a2=9: …

So I want a FULL combination of both levels (variables and value).

HHC

Solution
‎08-13-2013 10:30 AM
Super User
Posts: 5,504

Re: Help me with summary on combination of variables' value

It sounds like you need the PROC SUMMARY approach, with this additional statement:

ways 2;

I guess if you add WAYS 2, then NWAYS should be removed.

That should do it.

Super Contributor
Posts: 412

Re: Help me with summary on combination of variables' value

Posted in reply to Astounding

Thank you,

I think I got it now.

proc sort data=have; by target;run;

proc summary data=have;

by target;

class a1 a2 a3 a4 a5 a6;

ways 2;

output out=want (drop=_TYPE_);

run;

/*this step will help you to put all name in a field. you can feel free to delete a1-a6*/

data final;

set want;

length name1 name2 $5;

array _a{*} a:;

count=0;

do i=1 to dim(_a);

  if _a{i}^=. then do; count=count+1;

  if count=1 then do;

  name1=vname(_a{i});

  value1=_a{i};

  end;

  else if count=2 then do;

  name2=vname(_a{i});

  value2=_a{i};

  end;

end;

end;

run;

Super User
Posts: 5,504

Re: Help me with summary on combination of variables' value

The easiest, soundest way would be to use PROC FREQ:

proc freq data=have;

   tables a1 * a2 * a3 * a4 * a5 * a6 * target / noprint out=want (drop=percent);

run;

You will need enough memory to track all the combinations, and you will get a separate observation for each value of TARGET.  It you are sure that TARGET always takes on values of 0 and 1, you could shrink the size of your data set by switching to PROC SUMMARY.  In that case, one variable would show the total number of observations, and a second variable would contain the sum of all TARGET values.

proc summary data=have nway;

   class a1 a2 a3 a4 a5 a6;

   var target;

   output out=want (drop=_type_) sum=target_total;

run;

If any of the CLASS variables could contain a missing value, you would have to add the MISSING option.

Good luck.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 326 views
  • 0 likes
  • 3 in conversation