I have a file with over 187,000 records:
Claim number
Sequence number
Date of service
Provider number
Claim Status
The issue is when I try to distinct the list using MAX and DISTINCT - I get the same results.
Mock sample of data
claim # Seq # DOS
123456 001 01/01/2013
123456 002 01/01/2013
121212 001 02/01/2013
555555 001 01/01/2013
555555 002 01/01/2013
555555 003 01/01/2013
In Access this would simply require a sort on claim # and Seq # and then use the Last function to get the following output:
123456 002 01/01/2013
121212 001 02/01/2013
555555 003 01/01/2013
Thanks in advance
Hi,
I am a little confused with your output requirement. Is it really as simple as this that you want?-
data have; | |
input claim $ seq $ DOS ddmmyy10.; | |
format dos ddmmyy10.; | |
datalines; | |
123456 001 01/01/2013 | |
123456 002 01/01/2013 | |
121212 001 02/01/2013 | |
555555 001 01/01/2013 | |
555555 002 01/01/2013 | |
555555 003 01/01/2013 | |
; | |
run; | |
proc sort data= have; | |
by claim descending seq; | |
run; | |
data want; | |
set have; | |
by claim; | |
if first.claim; | |
run; |
You can do some similar stuff like you mentioned sorting Access. Since it looks too simple to be true, I fear my understanding is perhaps not right. Give the above a try. No harm:) lol
Thanks,
Naveen
Hi,
Please try
proc sql;
create table want as select distinct claim,max(seq) format=z3. ,dos from have
group by claim,dos;
quit;
Thanks,
Jagadish
When I run the following I'm still getting unwanted lines:
claim # seq# dos prvd# clmstatus paymentstatus
2013499501 1 09AUG2013 DR0891 31 11
2013499501 2 09AUG2013: DR0891 31 31
In the above data (actual data but modified for HIPPA) the line of data I should see only is:
2013499501 2 09AUG2013: DR0891 31 31
The claimnumber, provider# and dos will repeat, but the seq# increments so I need to make sure I pull the latest seq#. Other variables will change within the dataset, which is why the distinct isnt working
PROC SQL;
CREATE TABLE WORK.QUERY_FOR_ENCOUNTER AS
SELECT DISTINCT t1.CLAIMNUMBER,
(MAX(t1.SEQNUM)) AS SEQNUM,
t1.BEGSERVICEDATE,
t1.PROVNUM,
t1.CLAIMSTATUS,
t1.PAYMENTSTATUS
FROM WORK.ENCOUNTER t1
GROUP BY t1.CLAIMNUMBER, t1.BEGSERVICEDATE;
QUIT;
how about this?
proc sql;
create table work.tmp as
select
claimnumber,
beginservicedate,
max(seqnum) as LastSeq
from
work.encounter
group by claimnumber,beginservicedate;
create table work.want as
select t1.*
from
work.encounter t1
inner join (
select
claimnumber,
beginservicedate,
max(seqnum) as LastSeq
from
work.encounter
group by claimnumber,beginservicedate
) t2
on t1.claimnumber=t2.claimnumber
and t1.beginservicedate=t2.beginservicedate;
quit;
you might be able to insert the tmp step into the query but I wasn't sure about your performance.
proc sql;
create table work.want as
select t1.*
from
work.encounter t1
inner join work.tmp t2
on t1.claimnumber=t2.claimnumber
and t1.beginservicedate=t2.beginservicedate;
quit;
another option....
proc sql;
create table work.want as
select * from work.encounter t1
where
seqnum=(select max(seqnum) from work.encounter where claimnumber=t1.claimnumber);
quit;
I'm looking into both
Hi,
I am a little confused with your output requirement. Is it really as simple as this that you want?-
data have; | |
input claim $ seq $ DOS ddmmyy10.; | |
format dos ddmmyy10.; | |
datalines; | |
123456 001 01/01/2013 | |
123456 002 01/01/2013 | |
121212 001 02/01/2013 | |
555555 001 01/01/2013 | |
555555 002 01/01/2013 | |
555555 003 01/01/2013 | |
; | |
run; | |
proc sort data= have; | |
by claim descending seq; | |
run; | |
data want; | |
set have; | |
by claim; | |
if first.claim; | |
run; |
You can do some similar stuff like you mentioned sorting Access. Since it looks too simple to be true, I fear my understanding is perhaps not right. Give the above a try. No harm:) lol
Thanks,
Naveen
Hmmmmm - I didn't know there is a first.function available in SAS. I'm still new to the whole thing. Basically I need to be able to grab the latest line of data based on the last sequence regardless of how many other variables are in the same column and if they are distinct or non distinct of one another.
I'll test out your dataset and let you know
the sorting option is fine...unless the data resides in an external database. If it does, then SAS will create a local copy for sorting. That might not be an issue for you but it can be a time/disk intensive process depending on how data volume.
True, 187,000 records seems no big deal to have a local copy i guess. But when we take time/disk into consideration, I guess proc sql will help-
proc sql; | |
create table want1 as | |
select claim, max(seq) as max_seq,dos | |
from have | |
group by claim, dos | |
order by claim,dos; |
quit;
The proc sort and data step worked great. I couldn't get the proc sql to work due to the other variables having distinct data.
The proc sort and data set ignores all other columns and only looks at the field placed after the FIRST.
CLAIMNUMBER,
SUBCLM,
SEQNUM,
BEGSERVICEDATE,
PROVNUM,
PROVLOC,
FACILITYNAME,
FFSORCAPIND,
CLAIMSTATUS,
PAYMENTSTATUS
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.