BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
hhchenfx
Barite | Level 11

Hi Everyone,

I have a dataset that have a Target Variable and a number of independent variables, say a1 a2 … a6 as below.

I want to create a summary file of the target value for each combination of independent variable and their value.

Basically the summary will answer the question:

If a1=5 and a2=9,       how many observations have target=0 and how many have target=1.

If a1=4 and a2=1,       how many observations have target=0 and how many have target=1.

If a1=5 and a3=1,       how many observations have target=0 and how many have target=1.

…..

I really appreciate it if you could help me with this problem.

Thank you,

HHC

data have;

  input target a1 a2 a3 a4 a5 a6;

  datalines;

0 5 9 1 0 8 1

1 4 0 1 1 5 0

1 8 1 2 3 1 1

1 3 3 0 2 0 6

0 4 1 1 7 0 0

0 3 3 0 9 0 3

1 2 1 1 2 1 2

0 1 2 0 3 0 4

;run;

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

It sounds like you need the PROC SUMMARY approach, with this additional statement:

ways 2;

I guess if you add WAYS 2, then NWAYS should be removed.

That should do it.

View solution in original post

7 REPLIES 7
LinusH
Tourmaline | Level 20

proc summary or SQL group by...?

proc sql;

select a1, a2, target, count(*) as nobs

from have

group by a1, a2, target;

quit;

Data never sleeps
hhchenfx
Barite | Level 11

Thank you for your answer.

My problem is that I want SAS to create the summary across ALL combination of variable and ALL value.

I should have clarify it in the original post.

HHC

LinusH
Tourmaline | Level 20

So what combinations do wish to calculate? I can't see the pattern in your example...?

Data never sleeps
hhchenfx
Barite | Level 11

Actually, there are 2 levels of combinations: :smileyinfo: combination of variables and (ii) combination of value of variable.

For the first level, I want all combination of a1-a6, which are:

a1a2, a1a3,a1a4,a1a5,a1a6;

a2a3,a2a4,a2a5,a2a5;

a3a4,a3a5,a3a6;

a4a5,a4a6;

and a5a6.

For each of the above variable-combination I want value combination. Take the first pair a1a2 as a example:

a1 has value 1 2 3 4 5 8

a2 has value 1 2 3 9

I want to report how many observation with target=1 and target=0 for each value-combination as below:

a1=1 and a2=1: 0 observation with target=0 ; 0 observation with target=1 (In fact, for simplicity this report can be ignored since there is no such combination in the data)

a1=1 and a2=2:  1 observation with target=0; 0 observation with target=1

….

a1=8 and a2=9: …

So I want a FULL combination of both levels (variables and value).

HHC

Astounding
PROC Star

It sounds like you need the PROC SUMMARY approach, with this additional statement:

ways 2;

I guess if you add WAYS 2, then NWAYS should be removed.

That should do it.

hhchenfx
Barite | Level 11

Thank you,

I think I got it now.

proc sort data=have; by target;run;

proc summary data=have;

by target;

class a1 a2 a3 a4 a5 a6;

ways 2;

output out=want (drop=_TYPE_);

run;

/*this step will help you to put all name in a field. you can feel free to delete a1-a6*/

data final;

set want;

length name1 name2 $5;

array _a{*} a:;

count=0;

do i=1 to dim(_a);

  if _a{i}^=. then do; count=count+1;

  if count=1 then do;

  name1=vname(_a{i});

  value1=_a{i};

  end;

  else if count=2 then do;

  name2=vname(_a{i});

  value2=_a{i};

  end;

end;

end;

run;

Astounding
PROC Star

The easiest, soundest way would be to use PROC FREQ:

proc freq data=have;

   tables a1 * a2 * a3 * a4 * a5 * a6 * target / noprint out=want (drop=percent);

run;

You will need enough memory to track all the combinations, and you will get a separate observation for each value of TARGET.  It you are sure that TARGET always takes on values of 0 and 1, you could shrink the size of your data set by switching to PROC SUMMARY.  In that case, one variable would show the total number of observations, and a second variable would contain the sum of all TARGET values.

proc summary data=have nway;

   class a1 a2 a3 a4 a5 a6;

   var target;

   output out=want (drop=_type_) sum=target_total;

run;

If any of the CLASS variables could contain a missing value, you would have to add the MISSING option.

Good luck.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 993 views
  • 0 likes
  • 3 in conversation