BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Tamino
Obsidian | Level 7

Hello

I am working with claims data from a health insurance. My dataset provides information about every claims from their insured patients, which means patients may appear more than once in the dataset if they have more than one claim. Patients are identified by their patient-ID

 

I would like to find out how many different patients I have in my dataset, meaning I would like to know how many different patient-IDs I have in my dataset.

 

If I try 

proc freq data=dataset; tables patient_id; run;

it just shows me how often a certain patient-ID appears in the dataset, but not how many different patient-IDs there are.

 

Any suggestions, how I could easily figure this out with a simple command?

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

The NLEVELS option of Proc Freq will supply a short summary of the number of "levels", i.e. values, for a variable and some info about missing, if any. Use the ODS SELECT to limit output to just that table:

 

Example with a data set you should be able to test the code with:

ods select nlevels;
proc freq data=sashelp.class nlevels;
tables sex;
run;

Which shows a result of:

The FREQ Procedure
Number of Variable
Levels
Variable Levels
Sex 2

Not the number of observations, just the number of values. There is no information about missing. That means that variable had no missing values for the variable.

View solution in original post

7 REPLIES 7
mkeintz
PROC Star

Here is an example of counting the number of distinct AGE variable values in sashelp.class:

 

data _null_;
  if 0 then set sashelp.class;
  declare hash h (dataset:'sashelp.class (keep=age)');
    h.definekey(all:'Y');
    h.definedata(all:'Y');
    h.definedone();
  n_of_distinct_ages=h.num_items;
  put n_of_distinct_ages=;
run;

You can do the same with the ID variable in your dataset.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

The NLEVELS option of Proc Freq will supply a short summary of the number of "levels", i.e. values, for a variable and some info about missing, if any. Use the ODS SELECT to limit output to just that table:

 

Example with a data set you should be able to test the code with:

ods select nlevels;
proc freq data=sashelp.class nlevels;
tables sex;
run;

Which shows a result of:

The FREQ Procedure
Number of Variable
Levels
Variable Levels
Sex 2

Not the number of observations, just the number of values. There is no information about missing. That means that variable had no missing values for the variable.

Tamino
Obsidian | Level 7
Thank you! That works quite easy.
Tamino
Obsidian | Level 7

Hi

 

Is it also possible to count the number of "levels" depending on another variable in the dataset? For example, I would like to count number of different patient-IDs within each month (which is another variable).

 

Otherwise, it will only count number of different patient-IDs within the whole dataset, but I would like it per month, so a patient-ID can count as unique each month it appears in and not only once in the dataset.

 

Thank you!

ballardw
Super User

@Tamino wrote:

Hi

 

Is it also possible to count the number of "levels" depending on another variable in the dataset? For example, I would like to count number of different patient-IDs within each month (which is another variable).

 

Otherwise, it will only count number of different patient-IDs within the whole dataset, but I would like it per month, so a patient-ID can count as unique each month it appears in and not only once in the dataset.

 

Thank you!


If you use BY group processing you get an Nlevels report for each BY group. So sort your data by whatever grouping variable you want and use that as the BY variable in proc Freq and request the other variable(s) on the Table statement.

Tamino
Obsidian | Level 7
Thank you, I forgot to sort it before, that's why it didn't work.

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 906 views
  • 2 likes
  • 3 in conversation