Hi i have list of claim numbers each with different loss codes like 27,27C,27Y,27O,27P . Now my problem is i watnt to get the claim numbers which have the loss codes except 27. Hope it make sense.
Claim Numers | Loss codes |
CML0001 | 27 |
CML0002 | 27C |
CML0002 | 27 |
CML0003 | 27Y |
CML0003 | 27C |
CML0003 | 27P |
CML0004 | 27O |
CML0004 | 27Y |
CML0004 | 27 |
CML0004 | 27P |
CML0005 | 27P |
CML0005 | 27O |
CML0005 | 27C |
CML0005 | 27 |
CML0006 | 27Y |
CML0006 | 27C |
CML0007 | 27O |
CML0007 | 27Y |
CML0008 | 27O |
CML0009 | 27P |
Below claim numbers should be displayed.
CML0003 |
CML0006 |
CML0007 |
CML0008 |
CML0009 |
Thanks in advanceee
By merging the dataset in twice, but with a where clause on the claims you don't want, and by doing a first. on claim_number, this works (as long as '27's will be unique per claim number):
data claims;
infile cards dsd dlm='09'x;
attrib claim_number length=$ 7;
attrib loss_code length=$ 3;
input claim_number
loss_code;
cards;
CML0001 27
CML0002 27C
CML0002 27
CML0003 27Y
CML0003 27C
CML0003 27P
CML0004 27O
CML0004 27Y
CML0004 27
CML0004 27P
CML0005 27P
CML0005 27O
CML0005 27C
CML0005 27
CML0006 27Y
CML0006 27C
CML0007 27O
CML0007 27Y
CML0008 27O
CML0009 27P
;
run;
data want;
merge claims(keep=claim_number)
claims(in=in_loss where=(loss_code = '27'));
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;
Is that all you want?
By merging the dataset in twice, but with a where clause on the claims you don't want, and by doing a first. on claim_number, this works (as long as '27's will be unique per claim number):
data claims;
infile cards dsd dlm='09'x;
attrib claim_number length=$ 7;
attrib loss_code length=$ 3;
input claim_number
loss_code;
cards;
CML0001 27
CML0002 27C
CML0002 27
CML0003 27Y
CML0003 27C
CML0003 27P
CML0004 27O
CML0004 27Y
CML0004 27
CML0004 27P
CML0005 27P
CML0005 27O
CML0005 27C
CML0005 27
CML0006 27Y
CML0006 27C
CML0007 27O
CML0007 27Y
CML0008 27O
CML0009 27P
;
run;
data want;
merge claims(keep=claim_number)
claims(in=in_loss where=(loss_code = '27'));
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;
Is that all you want?
That's Right on Laurie 🙂 Thanks . I have duplicate claim numbers with same loss codes but i can perform the proc sort by claim number and loss codes and execute the same program to get the result.
Technically speaking, you don't have to dedup where you've got multiple '27's in a claim - you'll get a warning about a merge statement with multiple instances of by variables. But in this case it doesn't matter.
If you want to, you can do it in a data step (with or without a view - it's about the same efficiency). I randomly generated 10 million claims and loss codes, and it still ran in under two seconds:
data claim_27v / view=claim_27v;
set claims;
by claim_number;
where loss_code = '27';
if first.claim_number;
keep claim_number;
run;
data want;
merge claims(keep=claim_number)
claim_27v(in=in_loss);
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;
data claims;
infile cards dsd dlm='09'x;
attrib claim_number length=$ 7;
attrib loss_code length=$ 3;
input claim_number
loss_code;
cards;
CML0001 27
CML0002 27C
CML0002 27
CML0003 27Y
CML0003 27C
CML0003 27P
CML0004 27O
CML0004 27Y
CML0004 27
CML0004 27P
CML0005 27P
CML0005 27O
CML0005 27C
CML0005 27
CML0006 27Y
CML0006 27C
CML0007 27O
CML0007 27Y
CML0008 27O
CML0009 27P
;
run;
proc sql;
select distinct claim_number
from claims
group by claim_number
having sum(loss_code='27')=0;
quit;
If you have a very large table this may be the fastest method.
I think it reuses the cache created by the first 'set'?
data want;
retain gotone 0;
do until (last.claim_number);
set claims;
by claim_number;
if loss_code='27' then gotone=1;
end;
put gotone;
do until (last.claim_number);
set claims;
by claim_number;
if last.claim_number and gotone =0 then output;
end;
gotone=0;
run;quit;
@rogerjdeangelis I ran the simple merge against your code over 10 million rows - there's nothing in it. Effectively by reusing the dataset in a merge, the cache is reused as well.
You may like to take the put statement out though - I think you might have log problems!
It worked for me, see log below.
The merge and the DOW (two set statements)
had identical times.
I ran 100,000,000. My $600 workstation is too
fast for 10,000,000.
data claims;
input claim_number $ loss_code $;
cards4;
CML0001 27
CML0002 27C
CML0002 27
CML0003 27Y
CML0003 27C
CML0003 27P
CML0004 27O
CML0004 27Y
CML0004 27
CML0004 27P
CML0005 27P
CML0005 27O
CML0005 27C
CML0005 27
CML0006 27Y
CML0006 27C
CML0007 27O
CML0007 27Y
CML0008 27O
CML0009 27P
;;;;
run;
data want(keep=claim_number);
retain gotone 0;
do until (last.claim_number);
set claims;
by claim_number;
if loss_code='27' then gotone=1;
end;
do until (last.claim_number);
set claims;
by claim_number;
if last.claim_number and gotone =0 then output;
end;
gotone=0;
run;quit;
proc print data=want;
run;quit;
456 data claims;
457 input claim_number $ loss_code $;
458 cards4;
NOTE: The data set WORK.CLAIMS has 20 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.00 seconds
479 ;;;;
480 run;
481 data want(keep=claim_number);
482 retain gotone 0;
483 do until (last.claim_number);
484 set claims;
485 by claim_number;
486 if loss_code='27' then gotone=1;
487 end;
488 do until (last.claim_number);
489 set claims;
490 by claim_number;
491 if last.claim_number and gotone =0 then output;
492 end;
493 gotone=0;
494 run;
NOTE: There were 20 observations read from the data set WORK.CLAIMS.
NOTE: There were 20 observations read from the data set WORK.CLAIMS.
NOTE: The data set WORK.WANT has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
494 ! quit;
495 proc print data=want;
496 run;
NOTE: There were 5 observations read from the data set WORK.WANT.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
496 ! quit;
Up to 40 obs from WORK.WANT total obs=5
CLAIM_
Obs NUMBER
1 CML0003
2 CML0006
3 CML0007
4 CML0008
5 CML0009
Benchmark
data claims100mm;
set claims;
do rec=1 to 5000000;
output;
end;
run;quit;
data want;
merge claims100mm(keep=claim_number)
claims100mm(in=in_loss where=(loss_code = '27'));
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;
542 data want;
543 merge claims100mm(keep=claim_number)
544 claims100mm(in=in_loss where=(loss_code = '27'));
545 by claim_number;
546 if not in_loss;
547 if first.claim_number;
548 keep claim_number;
549 run;
NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 100000000 observations read from the data set WORK.CLAIMS100MM.
NOTE: There were 20000000 observations read from the data set WORK.CLAIMS100MM.
WHERE loss_code='27';
NOTE: The data set WORK.WANT has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 27.63 seconds
cpu time 27.62 seconds
data want(keep=claim_number);
retain gotone 0;
do until (last.claim_number);
set claims100mm;
by claim_number;
if loss_code='27' then gotone=1;
end;
do until (last.claim_number);
set claims100mm;
by claim_number;
if last.claim_number and gotone =0 then output;
end;
gotone=0;
run;
NOTE: There were 100000000 observations read from the data set WORK.CLAIMS100MM.
NOTE: There were 100000000 observations read from the data set WORK.CLAIMS100MM.
NOTE: The data set WORK.WANT has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 27.08 seconds
cpu time 27.00 seconds
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.