Hello
I am working with claims data from a health insurance. My dataset provides information about every claims from their insured patients, which means patients may appear more than once in the dataset if they have more than one claim. Patients are identified by their patient-ID
I would like to find out how many different patients I have in my dataset, meaning I would like to know how many different patient-IDs I have in my dataset.
If I try
proc freq data=dataset; tables patient_id; run;
it just shows me how often a certain patient-ID appears in the dataset, but not how many different patient-IDs there are.
Any suggestions, how I could easily figure this out with a simple command?
Thank you!
The NLEVELS option of Proc Freq will supply a short summary of the number of "levels", i.e. values, for a variable and some info about missing, if any. Use the ODS SELECT to limit output to just that table:
Example with a data set you should be able to test the code with:
ods select nlevels; proc freq data=sashelp.class nlevels; tables sex; run;
Which shows a result of:
Not the number of observations, just the number of values. There is no information about missing. That means that variable had no missing values for the variable.
Here is an example of counting the number of distinct AGE variable values in sashelp.class:
data _null_;
if 0 then set sashelp.class;
declare hash h (dataset:'sashelp.class (keep=age)');
h.definekey(all:'Y');
h.definedata(all:'Y');
h.definedone();
n_of_distinct_ages=h.num_items;
put n_of_distinct_ages=;
run;
You can do the same with the ID variable in your dataset.
The NLEVELS option of Proc Freq will supply a short summary of the number of "levels", i.e. values, for a variable and some info about missing, if any. Use the ODS SELECT to limit output to just that table:
Example with a data set you should be able to test the code with:
ods select nlevels; proc freq data=sashelp.class nlevels; tables sex; run;
Which shows a result of:
Not the number of observations, just the number of values. There is no information about missing. That means that variable had no missing values for the variable.
Hi
Is it also possible to count the number of "levels" depending on another variable in the dataset? For example, I would like to count number of different patient-IDs within each month (which is another variable).
Otherwise, it will only count number of different patient-IDs within the whole dataset, but I would like it per month, so a patient-ID can count as unique each month it appears in and not only once in the dataset.
Thank you!
@Tamino wrote:
Hi
Is it also possible to count the number of "levels" depending on another variable in the dataset? For example, I would like to count number of different patient-IDs within each month (which is another variable).
Otherwise, it will only count number of different patient-IDs within the whole dataset, but I would like it per month, so a patient-ID can count as unique each month it appears in and not only once in the dataset.
Thank you!
If you use BY group processing you get an Nlevels report for each BY group. So sort your data by whatever grouping variable you want and use that as the BY variable in proc Freq and request the other variable(s) on the Table statement.
Save $250 on SAS Innovate and get a free advance copy of the new SAS For Dummies book! Use the code "SASforDummies" to register. Don't miss out, May 6-9, in Orlando, Florida.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.