Hi Everyone,
I have a dataset that have a Target Variable and a number of independent variables, say a1 a2 … a6 as below.
I want to create a summary file of the target value for each combination of independent variable and their value.
Basically the summary will answer the question:
If a1=5 and a2=9, how many observations have target=0 and how many have target=1.
If a1=4 and a2=1, how many observations have target=0 and how many have target=1.
…
If a1=5 and a3=1, how many observations have target=0 and how many have target=1.
…..
I really appreciate it if you could help me with this problem.
Thank you,
HHC
data have;
input target a1 a2 a3 a4 a5 a6;
datalines;
0 5 9 1 0 8 1
1 4 0 1 1 5 0
1 8 1 2 3 1 1
1 3 3 0 2 0 6
0 4 1 1 7 0 0
0 3 3 0 9 0 3
1 2 1 1 2 1 2
0 1 2 0 3 0 4
;run;
It sounds like you need the PROC SUMMARY approach, with this additional statement:
ways 2;
I guess if you add WAYS 2, then NWAYS should be removed.
That should do it.
proc summary or SQL group by...?
proc sql;
select a1, a2, target, count(*) as nobs
from have
group by a1, a2, target;
quit;
Thank you for your answer.
My problem is that I want SAS to create the summary across ALL combination of variable and ALL value.
I should have clarify it in the original post.
HHC
So what combinations do wish to calculate? I can't see the pattern in your example...?
Actually, there are 2 levels of combinations: :smileyinfo: combination of variables and (ii) combination of value of variable.
For the first level, I want all combination of a1-a6, which are:
a1a2, a1a3,a1a4,a1a5,a1a6;
a2a3,a2a4,a2a5,a2a5;
a3a4,a3a5,a3a6;
a4a5,a4a6;
and a5a6.
For each of the above variable-combination I want value combination. Take the first pair a1a2 as a example:
a1 has value 1 2 3 4 5 8
a2 has value 1 2 3 9
I want to report how many observation with target=1 and target=0 for each value-combination as below:
a1=1 and a2=1: 0 observation with target=0 ; 0 observation with target=1 (In fact, for simplicity this report can be ignored since there is no such combination in the data)
a1=1 and a2=2: 1 observation with target=0; 0 observation with target=1
….
a1=8 and a2=9: …
So I want a FULL combination of both levels (variables and value).
HHC
It sounds like you need the PROC SUMMARY approach, with this additional statement:
ways 2;
I guess if you add WAYS 2, then NWAYS should be removed.
That should do it.
Thank you,
I think I got it now.
proc sort data=have; by target;run;
proc summary data=have;
by target;
class a1 a2 a3 a4 a5 a6;
ways 2;
output out=want (drop=_TYPE_);
run;
/*this step will help you to put all name in a field. you can feel free to delete a1-a6*/
data final;
set want;
length name1 name2 $5;
array _a{*} a:;
count=0;
do i=1 to dim(_a);
if _a{i}^=. then do; count=count+1;
if count=1 then do;
name1=vname(_a{i});
value1=_a{i};
end;
else if count=2 then do;
name2=vname(_a{i});
value2=_a{i};
end;
end;
end;
run;
The easiest, soundest way would be to use PROC FREQ:
proc freq data=have;
tables a1 * a2 * a3 * a4 * a5 * a6 * target / noprint out=want (drop=percent);
run;
You will need enough memory to track all the combinations, and you will get a separate observation for each value of TARGET. It you are sure that TARGET always takes on values of 0 and 1, you could shrink the size of your data set by switching to PROC SUMMARY. In that case, one variable would show the total number of observations, and a second variable would contain the sum of all TARGET values.
proc summary data=have nway;
class a1 a2 a3 a4 a5 a6;
var target;
output out=want (drop=_type_) sum=target_total;
run;
If any of the CLASS variables could contain a missing value, you would have to add the MISSING option.
Good luck.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.