Hello,
I am currently using the National Cancer Database and I would like to use the variable PUF_FACILITY_ID to create a new variable based on the frequency a facility ID appears. I've attached a screen shot of the printout of part of the PROC FREQ procedure for this variable.
Using the frequency with which the unique facility ID shows up, I would like to create a new variable based on this frequency. This would be a way to create a "Facility Volume" variable. From this new variable I would like to create percentiles and ultimately a dichotomous variable based off of a percentile cutoff.
Thank you
Since you are already familiar with PROC FREQ, that would be a good tool for the first step:
proc freq data=have noprint;
tables PUF_FACILITY_ID /
out=facility_counts (keep=PUF_FACILITY_ID rename=(count=Facility_Volume));
run;
That gives you a SAS data set holding the facilities, and their volume (first two columns of your picture). For creating percentiles, you can use PROC SUMMARY just to get a limited number of percentiles. But PROC UNIVARIATE gives you much more flexibility as to which percentiles you can calculate.
Show the code you used where the OUT= options did not work and any messages from the log. I suspect you may have inserted a ; wher not needed.
You say you want to merge on PUF_CASE_ID. HOW is that related to PUF_FACILITY_ID? You did not mention that PUF_CASE_ID in the question at all. You will need to describe the relationship between the two variables or did you mean to count PUF_CASE_ID in the first place???
You're right about the error. The KEEP= list should also include COUNT:
out=counts (keep=PUF_FACILITY_ID count rename= (count=Facility_Volume));
And yes, all you can merge by is PUF_FACILITY_ID. Would that not do what you need?
You have a much better chance if link to the facility. I can see a strong likelihood of patients actually being seen at multiple facilities and so matching a single facility count to all records for the the patient could be terribly misleading.
If you actually want the count of the patient visits at that facility then that is something else and would involve, if using Proc Freq a tables statement with both variables.
You might want to post a small example of data, maybe 20 or 30 records. Dummy data is fine as long as it shows a reasonable combination of facility patient data AND what the desired result for that example data would be. You can get instructions here https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... on how to turn a SAS data set into code that we can execute to recreate your data and test code with the data.
If you can rely on the connection of each patient to one particular facility ID, you only need to get counts for each patient. Similarly to what was done previously:
proc freq data=have;
tables patient_id / noprint out=counts (keep=patient_id count rename=(count=patient_volume));
run;
Then merge back into the original data by patient_id.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.