New Contributor
Posts: 4

# Produce and count all possible combinations of the values in a column

[ Edited ]

Can anyone help me to solve below problem? I want to count number of students that take all combinations of courses by gender. SAS Proc, data step or sql please.

sample data that I have, in the sample there are only 4 courses, but there are more than 100 courses in my dataset.

data student;

input Student_ID\$ Course\$ Gender\$;

datalines;

1001 Math F
1001 Computer F
1001 Physics F
1002 Math M
1002 Physics M
1003 Math M
1003 Chemstry M
1003 Computer M
1003 Physics M
1004 Chemstry M
1004 Computer M
1004 Physics M

;

Table that I want to get, combination is all the possible combinations of courses, Count_F is number of Female students take the combination of courses, Count_M is number of Male students take the combination of courses,

 Combination Count_F Count_M Math, Computer 1 1 Math, Physics 1 2 Math, Chemstry 0 1 Computer, Physics 1 2 Computer, Chemstry 0 2 Physics, Chemstry 0 2 Math, Computer, Physics 1 1 … … … Math, Computer, Physics, Chemstry 0 1
Super Contributor
Posts: 260

## Re: Produce and count all possible combinations of the values in a column

[ Edited ]

You'll need to explain more how your output is derived.

I see a Math F, and a Computer F and two Computer M's.

I see a result of Math, Computer 1M 1F.

Also, post your data in a data step.

New Contributor
Posts: 4

## Re: Produce and count all possible combinations of the values in a column

Thanks HB.

I have explained more about the question.

Super User
Posts: 6,527

## Re: Produce and count all possible combinations of the values in a column

[ Edited ]

I'm sure you realize ... if you have 10 possible courses, you are getting a table with roughly 1,000 rows.  At any rate, here is one approach.

data want;

array courses {4} Chemistry Computer Math Physics;

do until (last.Student_ID);

set have;

by Student_ID;

if course='Chemistry' then chemistry=1;

else if course='Computer' then computer=1;

else if course='Math' then math=1;

else if course='Physics' then physics=1;

end;

length combination \$ 40;

if chemistry=computer=math=physics=1 then do;

Combination = 'Chemistry, Computer, Math, Physics';

output;

end;

do k1=1 to 3;

first_course = vname(courses{k1});

if courses{k1}=1 then do k2=k1+1 to 4;

if courses{k2}=1 then do;

combination = strip(first_course) || ', ' || vname(courses{k2});

output;

end;

end;

end;

do k1=1 to 2;

first_course = vname(courses{k1});

if courses{k1}=1 then do k2=k1+1 to 3;

first_2 = strip(first_course) || ', ' || vname(courses{k2});

do k3=k2+1 to 4;

if courses{k3}=1 then do;

combination = strip(first_2) || ', ' || vname(courses{k3});

output;

end;

end;

end;

end;

keep combination gender;

run;

proc freq data=want;

tables combination * gender / norow nocol nopercent;

run;

It's untested code at the moment ... I will try to test it and update it if necessary.

Testing results:

It gets the right numbers, but ...

It's not pretty.  (You can get a nicer looking table by switching from PROC FREQ to PROC TABULATE.)

It doesn't print the zeros.  This also means if a combination doesn't exist in the data, there is no row in the report for that combination.

For more than 4 courses, there may be better approaches that aren't so lengthy and clumsy.  SAS does include functions that get all combinations (grayscale?).

Super User
Posts: 12,996

## Re: Produce and count all possible combinations of the values in a column

How many distinct courses are included in your data?

PROC Star
Posts: 1,258

## Re: Produce and count all possible combinations of the values in a column

I may have understood your requirements differently. Here's a solution that solves what I think is the problem...but I could be wrong!

Tom

``````proc sort data=student;
by Student_ID Course;
run;

data Inter01;
length Course_List \$200;
retain Course_List Count_F Count_M;
drop Student_ID Course Gender;
set student;
by Student_ID;

if first.Student_ID
then do;
Count_F = 0;
Count_M = 0;

if Gender = 'F' then
Count_F = 1;
else Count_M = 1;
Course_List = Course;
end;
else Course_List = strip(Course_List)||", "||strip(Course);

if last.Student_ID then
output;
run;

proc means data=Inter01 nway noprint;
var Count_F Count_M;
class Course_List;
output out=Want(drop=_TYPE_ _FREQ_) sum()=;
run;``````
New Contributor
Posts: 4

## Re: Produce and count all possible combinations of the values in a column

There are more than 100 courses in the my dataset.

Posts: 1,283

## Re: Produce and count all possible combinations of the values in a column

You want a frequency, by sex, of all within-student 2-way, 3-way, ... combinations of classes taken.  This is a good case for using the CALL LEXCOMBI call routine, which gets all non-missing combinations user-specified size, from a larger list of values.

``````data student;
input Student_ID\$ Course\$ Gender\$;
datalines;
1001 Math F
1001 Computer F
1001 Physics F
1002 Math M
1002 Physics M
1003 Math M
1003 Chemstry M
1003 Computer M
1003 Physics M
1004 Chemstry M
1004 Computer M
1004 Physics M
;

proc sort data=student; by student_id course;run;

data need (keep=student_id gender crse: combo_size crse_combo);
array _crse{10} \$8 _temporary_;
format crse_combo \$char100.;
retain NC 0;

set student;
by student_id;
if first.student_id then do;
call missing(of _crse{*});
NC=0;
end;
NC+1;
_crse{NC}=course;

if last.student_id and NC>1;

array crse{10} \$9;

array _i2[2];
array _i3[3];
array _i4[4];
array _i5[5];
array _i6[6];
array _i7[7];
array _i8[8];
array _i9[9];
array _i10[10];

array all_i{2:10,10}
_i2:  dum dum dum dum dum dum dum dum
_i3:  dum dum dum dum dum dum dum
_i4:  dum dum dum dum dum dum
_i5:  dum dum dum dum dum
_i6:  dum dum dum dum
_i7:  dum dum dum
_i8:  dum dum
_i9:  dum
_i10:  ;

if nc>1 then do combo_size=2 to NC;
all_i{combo_size,1}=0;
ncomb=comb(nc, combo_size);
do j=1 to ncomb;
select (combo_size);
when (2)  rc=lexcombi(nc, combo_size, of _i2[*]);
when (3)  rc=lexcombi(nc, combo_size, of _i3[*]);
when (4)  rc=lexcombi(nc, combo_size, of _i4[*]);
when (5)  rc=lexcombi(nc, combo_size, of _i5[*]);
when (6)  rc=lexcombi(nc, combo_size, of _i6[*]);
when (7)  rc=lexcombi(nc, combo_size, of _i7[*]);
when (8)  rc=lexcombi(nc, combo_size, of _i8[*]);
when (9)  rc=lexcombi(nc, combo_size, of _i9[*]);
when (10) rc=lexcombi(nc, combo_size, of _i10[*]);
end;
do h=1 to combo_size;
crse[h]=_crse[all_i{combo_size,h}];
end;
crse_combo=put(combo_size,2.) || ' ' || cat(of crse{*});
output;
end;
end;
run;

proc tabulate data=need noseps;
class gender crse_combo;
tables crse_combo=' ',gender=' '*N=' '*f=4.
/ rts=100 box='Size / Course List' misstext='0';
run;

``````

Notes:

1. The proc sort is done, so that all combinations are in alphabetical order  (i.e. there are math/physics, but no physics/math).
2. I declare index arrays for each expected combination size (I used a maximum of size 10).
3. Each array (_i2, _i3, etc.) is used with call lexcombi for its own size.  You can't use "of _I{*}" with call lexcombi for a size other than its own.   That's why you see the "select (combo_size)" group of statements.
4. By declaring a 2-dimensional array ALL_i  (one row per expected combo_size), constructed of the individual _i arrays and repeated use of the DUM variable, I can use compact statements such as   "crse[h]=_crse[all_i{combo_size,h}];" instead of another set of select statements.  Array all_i has an upper triangle of repeated use of the dum vars.
Super Contributor
Posts: 269

## Re: Produce and count all possible combinations of the values in a column

@mkeintzHere is the smart solution you provided, with slight macro usage twist to enable data driven code generation for the array processing.

``````data student;
input Student_ID\$ Course\$ Gender\$;
datalines;
1001 Math F
1001 Computer F
1001 Physics F
1002 Math M
1002 Physics M
1003 Math M
1003 Chemstry M
1003 Computer M
1003 Physics M
1004 Chemstry M
1004 Computer M
1004 Physics M
;
run;
/*Extract the count of unique courses and store it in a macro variable*/
proc sql noprint;
select count(distinct (course))
into :g_numUnqCourse
from work.student;
quit;

proc sort data=student;
by student_id course;
run;

/* Declare a macro to auto generate SAS statements */
%macro InitDynArray(p_max);
%local l_n;
%do l_n=2 %to &p_max;
%str(ARRAY _i&l_n[&l_n];)
%end;
ARRAY all_i{2:&p_max,&p_max}
%do l_n=2 %to &p_max;
_i&l_n:
%do j=1 %to %eval(&p_max - &l_n);
%str(dum )
%end;
%end;
%str(;)
%mend;

/* Declare a macro to auto generate SAS statements */
%macro dynArrayCombo(p_max);
%local l_n;
%do l_n=2 %to &p_max;
WHEN (&l_n)  rc=LEXCOMBI(nc, combo_size, of _i&l_n[*]);
%end;
%mend;
OPTIONS MPRINT; /*Display generated statements by the macro calls*/
data need (keep=student_id gender crse: combo_size crse_combo);
array _crse{&g_numUnqCourse} \$8 _temporary_;
format crse_combo \$char100.;
retain NC 0;
set student;
by student_id;

if first.student_id then
do;
call missing(of _crse{*});
NC=0;
end;

nc+1;
_crse{nc}=course;

if ((last.student_id) and (nc > 1));

ARRAY crse{&g_numUnqCourse} \$9;
%InitDynArray(&g_numUnqCourse); /* Call a macro */

if (nc > 1) then
do combo_size=2 to nc;
all_i{combo_size,1}=0;
ncomb=comb(nc, combo_size);

do j=1 to ncomb;
select (combo_size);
%dynArrayCombo(&g_numUnqCourse) /* Call a macro */
otherwise;
end; /* End of Select */

do h=1 to combo_size;
crse[h]=_crse[all_i{combo_size,h}];
end;

crse_combo=put(combo_size,2.) || ': ' || catx(',',of crse{*});
output;
end; /* End of j=1 to ncomb */
end; /* End of combo_size=2 to NC */
run;

proc tabulate data=need noseps;
class gender crse_combo;
tables crse_combo=' ',gender=' '*N=' '*f=4.
/ rts=100 box='Size / Course List' misstext='0';
run;``````

Regards,

Ahmed

Posts: 1,283

## Re: Produce and count all possible combinations of the values in a column

@AhmedAl_Attar

I appreciate your macro-izing of the code I provided.

But to be honest, if the dataset is large, I would run a PROC SQL against the entire data set just to generate the maximum number of possible courses.  I would just parameterize the macro it with a number that would surely exceed all possible course counts.

After all once you've macro-ized, you aren't subject to typing errors when coding for each combination size, so the penalty for a too-large number becomes negligible

Of course, if I had been true to efficiency-uber-alles, I would have avoided the proc sort, since the data were already sorted by student_id, but not course-within-student_id.  Instead, after reading in the array of courses for a given student, I would have used CALL SORTC of the array, followed by left-shifting of the results to avoid blank course values at the lower bound of the array.

New Contributor
Posts: 4

## Re: Produce and count all possible combinations of the values in a column

Thanks a lot Ahmed, your solution is great. I have one question for you, your code is working well when I test it in sample data set which has about 10 courses, but it gives me an error when I test in another data set which has about 100 courses in it.

Could you please give me a hint to make it work for large data set?

Super Contributor
Posts: 269

## Re: Produce and count all possible combinations of the values in a column

Hi Vicky,
Without knowing the exact nor the nature of the error you are getting, I'm guessing it would have to do with
- Character length ----> Change this line: format crse_combo \$char100.;
- Memory allocated to your SAS session ---> use -MEMSIZE 4GB on your SAS invocation command line. Increase the allocated GB as needed.

With 100 courses --> ((2**100) -1) permutations!!
Which is really a very big number: 1.26765060023E30 !!

Hope this helps,
Ahmed
Super User
Posts: 10,610

## Re: Produce and count all possible combinations of the values in a column

[ Edited ]
``````data student;
input Student_ID\$ Course\$ Gender\$;
retain value 1;
datalines;
1001 Math F
1001 Computer F
1001 Physics F
1002 Math M
1002 Physics M
1003 Math M
1003 Chemstry M
1003 Computer M
1003 Physics M
1004 Chemstry M
1004 Computer M
1004 Physics M
;

proc transpose data=student out=temp ;
by Student_ID gender;
var value;
id course;
run;

proc sql noprint;
select distinct Course into : list separated by ' '
from student;
select count(distinct course) into : n
from student;
quit;

data temp1;
set temp;
array x{*} &list ;
array y{&n};
length combination \$ 2000;
k=-1;
do i=1 to 2**&n ;
rc=graycode(k,of y{*});
yes=1;
do j=1 to &n;
if y{j}=1 and x{j}=. then yes=0;
end;
call missing(combination);
if yes then do;
do j=1 to &n;
if y{j}=1 then combination=catx(',',combination,vname(x{j})) ;
end;
if not missing(combination) then output;
end;
end;
keep gender combination;
run;
proc freq data=temp1 noprint;
table  gender*combination/out=temp2 list;
run;
proc sort data=temp2;by combination;run;
proc transpose data=temp2 out=want prefix=count_;
by combination;
var count;
id gender;
run;``````

P.S. modified. It seems it is not so easy.

Super Contributor
Posts: 269

## Re: Produce and count all possible combinations of the values in a column

Hi,

Here is yet another solution, While I have used some code from previous contributors, I've used Custom written function via Proc FCMP to produce and return all possible course combinations.

The  getCombos(values [*] \$, sep \$) \$; user function, accepts an array and a separator character, and returns a separated String of course combinations. The logic implemented inside that function was inspired from an old SUGI 29 paper .

``````
proc fcmp
outlib=sasuser.functions.customFunctions;

function getCombos(values [*] \$, sep \$) \$;
length combos \$100 l_sep \$1;
length i vCount binaryDigit 4;
length max_Type_  start i_tmp 8;
length final_combo \$1000;

l_sep=ifc (sep='','I',sep) ;

*put values[*];
vCount=0;
do i=1 to dim(values);
vCount+ifc(values[i]='',0,1);
end;

max_Type_ = 2**vCount - 1;
*put max_Type_;

/*  Loop through till max_Type_ is reached*/
do start=0 to max_Type_;
i_tmp = start;

/* Loop through all passed-in values */
do j=vCount to 1 by -1;
binaryDigit= int(i_tmp/(2**(j-1)) );
if (binaryDigit = 1) then
do;
combos = catx(',',combos,values[(vCount-j+1)]);

* Decrement i_tmp if binaryDigit is in types binary representation;
i_tmp = i_tmp - 2**(j-1);
end;
end; /*End of  j=vCount to 1 by -1*/

/* Only maintain combinations of more than 1 value */
if (index(combos,',') GT 0) then
do;
*put combos;
final_combo = catx(l_sep,final_combo,combos);
*put final_combo;
end;
combos='';
end; /* End of start=0 to max_Type_*/

return (final_combo);
endsub;
run;

options cmplib=(sasuser.functions);

data student;
input Student_ID\$ Course\$ Gender\$;
datalines;
1001 Math F
1001 Computer F
1001 Physics F
1002 Math M
1002 Physics M
1003 Math M
1003 Chemstry M
1003 Computer M
1003 Physics M
1004 Chemstry M
1004 Computer M
1004 Physics M
;
run;
proc sort data=student;
by student_id course;
run;

data want(keep=student_id gender crse_combo);
array _crse{10} \$9 _temporary_;
retain nc 0;
set student;
by student_id;
length courses \$1000;

if first.student_id then
do;
call missing(of _crse{*});
nc=0;
end;

nc+1;
_crse{nc}=course;

if ((last.student_id) and (nc > 1));
courses = getCombos(_crse, "~");
*put courses=;
comboCount=countc(courses,'~')+1;
do c=1 to comboCount;
crse_combo=scan(courses,c,'~');
combo_size=countc(crse_combo,',')+1;
crse_combo= put(combo_size,2.) || ': ' || crse_combo;
output;
end;
run;

proc tabulate data=want noseps;
class gender crse_combo;
tables crse_combo=' ',gender=' '*N=' '*f=4.
/ rts=100 box='Size / Course List' misstext='0';
run;``````

Hope this helps,

Ahmed

Discussion stats
• 13 replies
• 418 views
• 1 like
• 8 in conversation