Solved: Re: MAX and Distinct

Dsrountree · Posted 12-26-2013 11:22 AM

I have a file with over 187,000 records:

Claim number

Sequence number

Date of service

Provider number

Claim Status

The issue is when I try to distinct the list using MAX and DISTINCT - I get the same results.

Mock sample of data

claim # Seq # DOS

123456 001 01/01/2013

123456 002 01/01/2013

121212 001 02/01/2013

555555 001 01/01/2013

555555 002 01/01/2013

555555 003 01/01/2013

In Access this would simply require a sort on claim # and Seq # and then use the Last function to get the following output:
123456 002 01/01/2013

121212 001 02/01/2013

555555 003 01/01/2013

Thanks in advance

NaveenSrinivasan · Posted 12-26-2013 01:24 PM

Hi,

I am a little confused with your output requirement. Is it really as simple as this that you want?-

data have;
input claim $ seq $ DOS ddmmyy10.;
format dos ddmmyy10.;
datalines;
123456 001 01/01/2013
123456 002 01/01/2013
121212 001 02/01/2013
555555 001 01/01/2013
555555 002 01/01/2013
555555 003 01/01/2013
;
run;


proc sort data= have;
by claim descending seq;
run;

data want;
set have;
by claim;
if first.claim;
run;

You can do some similar stuff like you mentioned sorting Access. Since it looks too simple to be true, I fear my understanding is perhaps not right. Give the above a try. No harm:) lol

Thanks,

Naveen

View solution in original post

Jagadishkatam · Posted 12-26-2013 11:38 AM

Hi,

Please try

proc sql;

create table want as select distinct claim,max(seq) format=z3. ,dos from have

group by claim,dos;

quit;

Thanks,

Jagadish

Thanks,
Jag

Dsrountree · Posted 12-26-2013 12:34 PM

When I run the following I'm still getting unwanted lines:

claim # seq# dos prvd# clmstatus paymentstatus

2013499501 1 09AUG2013 DR0891 31 11

2013499501 2 09AUG2013: DR0891 31 31

In the above data (actual data but modified for HIPPA) the line of data I should see only is:

2013499501 2 09AUG2013: DR0891 31 31

The claimnumber, provider# and dos will repeat, but the seq# increments so I need to make sure I pull the latest seq#. Other variables will change within the dataset, which is why the distinct isnt working

PROC SQL;

CREATE TABLE WORK.QUERY_FOR_ENCOUNTER AS

SELECT DISTINCT t1.CLAIMNUMBER,

(MAX(t1.SEQNUM)) AS SEQNUM,

t1.BEGSERVICEDATE,

t1.PROVNUM,

t1.CLAIMSTATUS,

t1.PAYMENTSTATUS

FROM WORK.ENCOUNTER t1

GROUP BY t1.CLAIMNUMBER, t1.BEGSERVICEDATE;

QUIT;

DBailey · Posted 12-26-2013 12:59 PM

how about this?

proc sql;

create table work.tmp as

select

claimnumber,

beginservicedate,

max(seqnum) as LastSeq

from

work.encounter

group by claimnumber,beginservicedate;

create table work.want as

select t1.*

from

work.encounter t1

inner join (

select

claimnumber,

beginservicedate,

max(seqnum) as LastSeq

from

work.encounter

group by claimnumber,beginservicedate

) t2

on t1.claimnumber=t2.claimnumber

and t1.beginservicedate=t2.beginservicedate;

quit;

you might be able to insert the tmp step into the query but I wasn't sure about your performance.

proc sql;

create table work.want as

select t1.*

from

work.encounter t1

inner join work.tmp t2

on t1.claimnumber=t2.claimnumber

and t1.beginservicedate=t2.beginservicedate;

quit;

DBailey · Posted 12-26-2013 01:15 PM

another option....

proc sql;

create table work.want as

select * from work.encounter t1

where

seqnum=(select max(seqnum) from work.encounter where claimnumber=t1.claimnumber);

quit;

Dsrountree · Posted 12-26-2013 01:37 PM

I'm looking into both

NaveenSrinivasan · Posted 12-26-2013 01:24 PM

Hi,

I am a little confused with your output requirement. Is it really as simple as this that you want?-

data have;
input claim $ seq $ DOS ddmmyy10.;
format dos ddmmyy10.;
datalines;
123456 001 01/01/2013
123456 002 01/01/2013
121212 001 02/01/2013
555555 001 01/01/2013
555555 002 01/01/2013
555555 003 01/01/2013
;
run;


proc sort data= have;
by claim descending seq;
run;

data want;
set have;
by claim;
if first.claim;
run;

You can do some similar stuff like you mentioned sorting Access. Since it looks too simple to be true, I fear my understanding is perhaps not right. Give the above a try. No harm:) lol

Thanks,

Naveen

Dsrountree · Posted 12-26-2013 01:40 PM

Hmmmmm - I didn't know there is a first.function available in SAS. I'm still new to the whole thing. Basically I need to be able to grab the latest line of data based on the last sequence regardless of how many other variables are in the same column and if they are distinct or non distinct of one another.

I'll test out your dataset and let you know

DBailey · Posted 12-26-2013 04:08 PM

the sorting option is fine...unless the data resides in an external database. If it does, then SAS will create a local copy for sorting. That might not be an issue for you but it can be a time/disk intensive process depending on how data volume.

NaveenSrinivasan · Posted 12-26-2013 10:22 PM

True, 187,000 records seems no big deal to have a local copy i guess. But when we take time/disk into consideration, I guess proc sql will help-

proc sql;
create table want1 as
select claim, max(seq) as max_seq,dos
from have
group by claim, dos
order by claim,dos;

quit;

Dsrountree · Posted 12-27-2013 10:01 AM

The proc sort and data step worked great. I couldn't get the proc sql to work due to the other variables having distinct data.

The proc sort and data set ignores all other columns and only looks at the field placed after the FIRST.

CLAIMNUMBER,

SUBCLM,

SEQNUM,

BEGSERVICEDATE,

PROVNUM,

PROVLOC,

FACILITYNAME,

FFSORCAPIND,

CLAIMSTATUS,

PAYMENTSTATUS

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away