Remove multiple observations, aggregate on studyID and dispensing date, sum payment variable

jusjolly — Tue, 03 May 2022 21:41:38 GMT

Hello,

First time poster so apologies if I am unclear.

I am working with a large study cohort which has some claims data. In the data, there can be multiple (duplicate) observations per day. I want to define a duplicate as an observation with the same studyID and dispensing date and aggregate the data based on these two variables. I also want to sum the payment variables ('pay', 'ded') that occur in these separate observations. I want to keep some of the other variables in the data as well (eg diag1, diag2, age, region, SES).

I have tried:

proc sql;

create table test as select distinct

studyid, diag1, diag2, age, region, SES, dispensedate,

(sum(pay)) as totalpay,

(sum(ded)) as totalded,

from studydata

group by studyid, dispensedate;

quit;

Looking at the proc means min/max, I do not think the payment variables were summed...

Re: Remove multiple observations, aggregate on studyID and dispensing date, sum payment variable

Reeza — Tue, 03 May 2022 21:47:13 GMT

Why do you not think the payment as summed?

If you include variables that are not included in the GROUP BY statement or an aggregate calculation (ie diag1, diag2) it will cause SAS to merge the summary data with the original data causing you to have duplicates. If you expect DIAG1/DIAG2 to remain constant over the grouping you should add them to the GROUP BY statement. If you do not expect them to be consistent you need to define rules on which one to take and the solution will vary based on those rules.

@jusjolly wrote:

Hello,

First time poster so apologies if I am unclear.

I am working with a large study cohort which has some claims data. In the data, there can be multiple (duplicate) observations per day. I want to define a duplicate as an observation with the same studyID and dispensing date and aggregate the data based on these two variables. I also want to sum the payment variables ('pay', 'ded') that occur in these separate observations. I want to keep some of the other variables in the data as well (eg diag1, diag2, age, region, SES).

I have tried:

proc sql;

create table test as select distinct

studyid, diag1, diag2, age, region, SES, dispensedate,

(sum(pay)) as totalpay,

(sum(ded)) as totalded,

from studydata

group by studyid, dispensedate;

quit;

Looking at the proc means min/max, I do not think the payment variables were summed...

topic Re: Remove multiple observations, aggregate on studyID and dispensing date, sum payment variable in SAS Programming

Remove multiple observations, aggregate on studyID and dispensing date, sum payment variable

Re: Remove multiple observations, aggregate on studyID and dispensing date, sum payment variable