BookmarkSubscribeRSS Feed
Steelers_In_DC
Barite | Level 11

I have a list of 447 students, each of these students have classes that they have taken over the last 4  years.  I am interested in somehow joining all of them to find groups of classes that they have all taken.  I'm not that familiar with hash and I'm new to array.  What I'm thinking is a do loop, from 1 to 447.  I'd like to join 1 to 1 - 447, then 2 to 1 - 447 and so on.  

 

I have a two variable dataset, student_name and course_name, I've added student_num (1-447).  

 

The output would be sorted in descending order by how many similar classes there are alike between students.  If I get through the first iteration I'll most likely do it again and again. I'm doing this in WEKA software for an Applied Stats class, but I thought I'd take the opportunity to learn something in SAS as well.

 

Cheers,

Mark 

7 REPLIES 7
jklaverstijn
Rhodochrosite | Level 12

In essence you are looking for a Cartesian product of a table with itself. This can be done using SQL. An query joining two tables without a WHERE or ON clause will combine every row of the left table with every row of the right one. Usually this happens by mistake (and SAS warns you) but here it's what you want.

 

proc sql;
   select * from A, A;
quit;

You can add an ORDER BY to do the sorting.

 

Hope this helps,
- Jan.

Astounding
PROC Star

It would be helpful to have more information about the classes.

 

How many are there?  With only 10 classes, you can construct as many as 1,000 possible groups.

 

Are you going to try to roll up the groups?  For example, if one student took classes A, B, C, and D, but anothe student took classes A, B, C, and E, will you try to count both of them under the group ABC?

 

 

Steelers_In_DC
Barite | Level 11

There are 130 college courses and 448 students.  The end game is to cluster groups of 3 or 4 classes that have comminality between students.  It's a project where I will recommend minors or concentrations, classes that have a history of students taking them together. 

Reeza
Super User

Google "Market Basket Analysis" to find a macro for base SAS. 

MBA looks at things people purchase together, For example the beer and diaper example from WalMart, which is actually false. 

Astounding
PROC Star

That helps.  Here are some steps I would take.  (Feel free to ask about any of them in more detail.)

 

Come up with a master list of classes, numbering them in an order of your choice 1 - 130.

 

In the students data set, create a character variable 130 characters long.  A set of 0/1 values indicate whether the student took that course or not.  For example, "11010" means the student took classes 1, 2, and 4, but not classes 3 and 5.

 

Go through the students data set, and output each student multiple times.  (I'll use groups of 3 here.)  There will be one observation for each group of 3 classes the student took.  For example:

 

data halfway_there;

set have;

do class1=1 to 128;

   do class2 = class1 + 1 to 129;

      do class3 = class2 + 1 to 130;

         if substr(_130_classes, class1, 1) = '1' and substr(_130_classes, class2, 1)='1' and substr(_130_classes, class3, 1) = 1 

         then output;

      end;

   end;

end;

run;

 

Now a final PROC FREQ will count the occurrences of each group of 3:

 

proc freq data=halfway_there;

tables class1 * class2 * class3 / list;

run;

Shmuel
Garnet | Level 18

According to your statement "... each of these students have classes that they have taken ..." and

assuming that most of them have not taken all classes, then you probably have a table of

classes with the students in each class, that is:

    a table with 2 variables:  class, student name (or student number)

then you can run PROC FREQ like:

      proc freq data=have;

             table class*student / out=freqcount;

     run;

    proc sort data=freqcount; by descending count; run;

    proc print data=freqcount; run;

 

   if you need the list of students having those max freq. classes, it is possible to subset the table by:

       proc sql;

              select h.class, h.student

              from have as h

              left join freqcount (obs=1) as f

              where h.class = f.class and

                        h.student = f.student;

       quit;

Ksharp
Super User

Can you post some data and output to explain your question ?

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1105 views
  • 0 likes
  • 6 in conversation