I'm fairly new to SAS but was wondering if there is a way to delete observations by frequency count? For example, I used proc freq to generate a count of all diagnosis codes but all frequency counts<=5, I want to remove from the dataset. Is this possible?
There's going to be a lot of different ways to get this result. One way is to use the out= statement in your proc freq. This will get you a dataset with the result that you can manipulate anyway you want:
proc freq data= have;table variable /out=want;
For example,
I have over 500,000 observations in this dataset.
I already have my data filtered for the columns I need but for the diagnosis column, I ran a proc freq.
My Proc Freq data looks like:
DIAGNOSIS_PRIN_CD | Frequency | DIAGNOSIS_PRIN_CD | Frequency | V3000 | 32677 | V3001 | 21109 | 486 | 15683 | 389 | 10455 | V3000 | 32677 | V3001 | 21109 | 486 | 15683 | 389 | 10455 | 7778 | 2 | 7915 | 3 | 7672 | 1 | 861 | 5 |
I want to suppress or result any diagnosis code that has a frequency count <=5 with Blank or '.'
I am thinking about creating a data step and using IF-THEN statements to accomplish this task but it will be very time consuming.
proc freq might not be the best tool for data management. I would use proc SQL :
proc sql;
create table want as
select * from have
group by diagnosisCode
having count(*) > 5;
quit;
PG
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.