BookmarkSubscribeRSS Feed
jusjolly
Calcite | Level 5

Hello,

 

First time poster so apologies if I am unclear.

 

I am working with a large study cohort which has some claims data. In the data, there can be multiple (duplicate) observations per day. I want to define a duplicate as an observation with the same studyID and dispensing date and aggregate the data based on these two variables. I also want to sum the payment variables ('pay', 'ded') that occur in these separate observations. I want to keep some of the other variables in the data as well (eg diag1, diag2, age, region, SES).

 

I have tried: 

proc sql;

create table test as select distinct

studyid, diag1, diag2, age, region, SES, dispensedate,

(sum(pay)) as totalpay,

(sum(ded)) as totalded,

from studydata

group by studyid, dispensedate;

quit;

 

Looking at the proc means min/max, I do not think the payment variables were summed... 

 

1 REPLY 1
Reeza
Super User

Why do you not think the payment as summed?

 

If you include variables that are not included in the GROUP BY statement or an aggregate calculation (ie diag1, diag2) it will cause SAS to merge the summary data with the original data causing you to have duplicates. If you expect DIAG1/DIAG2 to remain constant over the grouping you should add them to the GROUP BY statement. If you do not expect them to be consistent you need to define rules on which one to take and the solution will vary based on those rules.

 


@jusjolly wrote:

Hello,

 

First time poster so apologies if I am unclear.

 

I am working with a large study cohort which has some claims data. In the data, there can be multiple (duplicate) observations per day. I want to define a duplicate as an observation with the same studyID and dispensing date and aggregate the data based on these two variables. I also want to sum the payment variables ('pay', 'ded') that occur in these separate observations. I want to keep some of the other variables in the data as well (eg diag1, diag2, age, region, SES).

 

I have tried: 

proc sql;

create table test as select distinct

studyid, diag1, diag2, age, region, SES, dispensedate,

(sum(pay)) as totalpay,

(sum(ded)) as totalded,

from studydata

group by studyid, dispensedate;

quit;

 

Looking at the proc means min/max, I do not think the payment variables were summed... 

 


 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 1 reply
  • 660 views
  • 0 likes
  • 2 in conversation