Help using Base SAS procedures

Help needed with excluding low frequency variables from data set

Reply
N/A
Posts: 1

Help needed with excluding low frequency variables from data set

Ok, bare with me here, I have only been using SAS for 2 weeks, so I realise some of my sentences will sound like soeone tying to learn a new language.

But here goes.

I have gathered data from our kennel club, one data set containing veterinary information (if the dog has a specific disease (PL= patellar luxation), and in that case, which degree, 0-3), and another 14 for each one of 14 breeds, containing names, registration numbers, lineage ect ect. I ended up with a dataset contaiing 250 000 observations, divided into 14 variables.

I have merged these files by breed code, registratio number, birth date and sex.

I have done some other ifs and formats to get what I want. So far so good.

I proceeded to see how many examinations were done by each veterinarian (variable called clinic). Also I wanted to see the frequencies of degrees (variable = degree) between each veterinarian.

So I wrote the following;

Proc sort data=plallbreeds;

by clinic degree;

PROC FREQ Data=plallbreeds ORDER=FREQ;

Tables Degree;

By Clinic;

Title 'bla bla bla';

Run;

And again, so far so good! Only problems is now that I have 390 veterinarians in my end result, and many of these have done less then 10 exams.

I would like to be able to exclude these from my output so I can better overview the result.

I have searched and searched but I just cant get the code right to do this.

Can someone here help me?

Super User
Posts: 5,431

Re: Help needed with excluding low frequency variables from data set

If you just want the basic frequency count, you could use SQL:

proc sql;

select clinic, degree, count(*) as Freq

from plallbreeds

group by clinic, degree

order by clinic, degree;

quit;

Data never sleeps
Super User
Posts: 5,513

Re: Help needed with excluding low frequency variables from data set

If you are happy with what you have so far, except that you want to eliminate the low counts, here is a way to modify the approach.  Instead of printing the report with PROC FREQ, create a data set holding the results.  The modification would be:

tables degree / noprint out=clinic_counts;

Then use PROC PRINT to print selected observations.  For example:

proc print data=clinic_counts;

  where count >= 10;

   *by, var, title statements as appropriate;

run;

Good luck.

Ask a Question
Discussion stats
  • 2 replies
  • 626 views
  • 0 likes
  • 3 in conversation