DATA Step, Macro, Functions and more

Summing multiple observations

Accepted Solution Solved
Reply
Contributor
Posts: 27
Accepted Solution

Summing multiple observations

I have a dataset where I have multiple patients entering the dataset at several points. I'm able to uniquely identify each patient as they enter into the dataset. With each time they come into the dataset the cost of their stay is recorded. I want to be able to sum the multiple costs per each patient, however, I'm not sure how to. Thanks!


Accepted Solutions
Solution
‎06-29-2015 01:17 PM
Super User
Posts: 10,550

Re: Summing multiple observations

One way:

Proc summary data=have nway;

     class patientid;

     var costofstay;

     output out=want (drop=_type_) sum=;

run;

the output data set will have a variable _freq_ that is the number of records for each patient.

View solution in original post


All Replies
Solution
‎06-29-2015 01:17 PM
Super User
Posts: 10,550

Re: Summing multiple observations

One way:

Proc summary data=have nway;

     class patientid;

     var costofstay;

     output out=want (drop=_type_) sum=;

run;

the output data set will have a variable _freq_ that is the number of records for each patient.

Contributor
Posts: 27

Re: Summing multiple observations

Thanks for the response. I'm not sure if its telling me that I'm missing the vast majority of my patientids since the freq for that statement is the majority of my data. However, I know for the patient ID i'm not missing anywhere near that amount. Also, for the sum=; var name that is being summed need to go there?

Thanks!

Super User
Posts: 10,550

Re: Summing multiple observations

When using SUM=, or other statistic in Proc Summary or Means then the output variable retains the input variable name. Only use for ONE statistic per variariable though. You can use Sum = /autoname to generate a variable that will have the variable named with the original variable and the statistic key appended: costofstay_sum for example if you want.

PatientId should be either a single variable or the combination of variables you use to "uniquely identify each patient".

If you are getting a value of _freq_ for and a missing PatientId that means you didn't include the NWAY instruction on the proc statement. That tells the procedure to only output the records where all of the class variables are matched up.

You can have a data set for example with State County and City. If you have Class State County City; then the default output generates summaries:

ALL records

State

County

City

State and county

State and city

County and City

State, county and city.

The variable _type_ had on the drop tells which of these combinations are represented.

NWAY would only have the STATE, county and city summary in the ouput.

Contributor
Posts: 27

Re: Summing multiple observations

I added in the nway and it looks like it fixed it. I'm not entirely sure as to why since the only class statement I have is the patientid. The patientid is a single variable, that is unique to each one. Any thoughts?

Super User
Posts: 10,550

Re: Summing multiple observations

That has class in Proc summary Behaving EXACTLY as intended. It provides a summary of ALL records by default plus one for each level of the patientid. Believe me, it is a useful feature as there are couple of ways to either specify the output combinations or summarize the data once and then select which summary to use for different purposes.

I have one project where I provide similar tables for data with 7 class variables that get reported at 5 individual and another20+ combinations of 2 to 4 of the class variables. Proc summary gives me one data set that I use the _type_ variable to select the needed summary for each section of the report.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 314 views
  • 3 likes
  • 2 in conversation