12-22-2016 01:35 PM
I am currently using the National Cancer Database and I would like to use the variable PUF_FACILITY_ID to create a new variable based on the frequency a facility ID appears. I've attached a screen shot of the printout of part of the PROC FREQ procedure for this variable.
Using the frequency with which the unique facility ID shows up, I would like to create a new variable based on this frequency. This would be a way to create a "Facility Volume" variable. From this new variable I would like to create percentiles and ultimately a dichotomous variable based off of a percentile cutoff.
12-22-2016 02:43 PM
Since you are already familiar with PROC FREQ, that would be a good tool for the first step:
proc freq data=have noprint;
tables PUF_FACILITY_ID /
out=facility_counts (keep=PUF_FACILITY_ID rename=(count=Facility_Volume));
That gives you a SAS data set holding the facilities, and their volume (first two columns of your picture). For creating percentiles, you can use PROC SUMMARY just to get a limited number of percentiles. But PROC UNIVARIATE gives you much more flexibility as to which percentiles you can calculate.
12-22-2016 05:01 PM
12-22-2016 05:39 PM
Show the code you used where the OUT= options did not work and any messages from the log. I suspect you may have inserted a ; wher not needed.
You say you want to merge on PUF_CASE_ID. HOW is that related to PUF_FACILITY_ID? You did not mention that PUF_CASE_ID in the question at all. You will need to describe the relationship between the two variables or did you mean to count PUF_CASE_ID in the first place???
12-22-2016 07:25 PM
You're right about the error. The KEEP= list should also include COUNT:
out=counts (keep=PUF_FACILITY_ID count rename= (count=Facility_Volume));
And yes, all you can merge by is PUF_FACILITY_ID. Would that not do what you need?
12-23-2016 09:26 AM
12-23-2016 10:55 AM
You have a much better chance if link to the facility. I can see a strong likelihood of patients actually being seen at multiple facilities and so matching a single facility count to all records for the the patient could be terribly misleading.
If you actually want the count of the patient visits at that facility then that is something else and would involve, if using Proc Freq a tables statement with both variables.
You might want to post a small example of data, maybe 20 or 30 records. Dummy data is fine as long as it shows a reasonable combination of facility patient data AND what the desired result for that example data would be. You can get instructions here https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat... on how to turn a SAS data set into code that we can execute to recreate your data and test code with the data.
12-23-2016 02:30 PM
If you can rely on the connection of each patient to one particular facility ID, you only need to get counts for each patient. Similarly to what was done previously:
proc freq data=have;
tables patient_id / noprint out=counts (keep=patient_id count rename=(count=patient_volume));
Then merge back into the original data by patient_id.