Hi Everyone,
I have a dataset that have a Target Variable and a number of independent variables, say a1 a2 … a6 as below.
I want to create a summary file of the target value for each combination of independent variable and their value.
Basically the summary will answer the question:
If a1=5 and a2=9, how many observations have target=0 and how many have target=1.
If a1=4 and a2=1, how many observations have target=0 and how many have target=1.
…
If a1=5 and a3=1, how many observations have target=0 and how many have target=1.
…..
I really appreciate it if you could help me with this problem.
Thank you,
HHC
data have;
input target a1 a2 a3 a4 a5 a6;
datalines;
0 5 9 1 0 8 1
1 4 0 1 1 5 0
1 8 1 2 3 1 1
1 3 3 0 2 0 6
0 4 1 1 7 0 0
0 3 3 0 9 0 3
1 2 1 1 2 1 2
0 1 2 0 3 0 4
;run;
It sounds like you need the PROC SUMMARY approach, with this additional statement:
ways 2;
I guess if you add WAYS 2, then NWAYS should be removed.
That should do it.
proc summary or SQL group by...?
proc sql;
select a1, a2, target, count(*) as nobs
from have
group by a1, a2, target;
quit;
Thank you for your answer.
My problem is that I want SAS to create the summary across ALL combination of variable and ALL value.
I should have clarify it in the original post.
HHC
So what combinations do wish to calculate? I can't see the pattern in your example...?
Actually, there are 2 levels of combinations: :smileyinfo: combination of variables and (ii) combination of value of variable.
For the first level, I want all combination of a1-a6, which are:
a1a2, a1a3,a1a4,a1a5,a1a6;
a2a3,a2a4,a2a5,a2a5;
a3a4,a3a5,a3a6;
a4a5,a4a6;
and a5a6.
For each of the above variable-combination I want value combination. Take the first pair a1a2 as a example:
a1 has value 1 2 3 4 5 8
a2 has value 1 2 3 9
I want to report how many observation with target=1 and target=0 for each value-combination as below:
a1=1 and a2=1: 0 observation with target=0 ; 0 observation with target=1 (In fact, for simplicity this report can be ignored since there is no such combination in the data)
a1=1 and a2=2: 1 observation with target=0; 0 observation with target=1
….
a1=8 and a2=9: …
So I want a FULL combination of both levels (variables and value).
HHC
It sounds like you need the PROC SUMMARY approach, with this additional statement:
ways 2;
I guess if you add WAYS 2, then NWAYS should be removed.
That should do it.
Thank you,
I think I got it now.
proc sort data=have; by target;run;
proc summary data=have;
by target;
class a1 a2 a3 a4 a5 a6;
ways 2;
output out=want (drop=_TYPE_);
run;
/*this step will help you to put all name in a field. you can feel free to delete a1-a6*/
data final;
set want;
length name1 name2 $5;
array _a{*} a:;
count=0;
do i=1 to dim(_a);
if _a{i}^=. then do; count=count+1;
if count=1 then do;
name1=vname(_a{i});
value1=_a{i};
end;
else if count=2 then do;
name2=vname(_a{i});
value2=_a{i};
end;
end;
end;
run;
The easiest, soundest way would be to use PROC FREQ:
proc freq data=have;
tables a1 * a2 * a3 * a4 * a5 * a6 * target / noprint out=want (drop=percent);
run;
You will need enough memory to track all the combinations, and you will get a separate observation for each value of TARGET. It you are sure that TARGET always takes on values of 0 and 1, you could shrink the size of your data set by switching to PROC SUMMARY. In that case, one variable would show the total number of observations, and a second variable would contain the sum of all TARGET values.
proc summary data=have nway;
class a1 a2 a3 a4 a5 a6;
var target;
output out=want (drop=_type_) sum=target_total;
run;
If any of the CLASS variables could contain a missing value, you would have to add the MISSING option.
Good luck.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.