Solved
Contributor
Posts: 29

# Urgent Help

Hi i have list of claim numbers each with different loss codes like 27,27C,27Y,27O,27P . Now my problem is i watnt to get the claim numbers which have the loss codes except 27. Hope it make sense.

 Claim Numers Loss codes CML0001 27 CML0002 27C CML0002 27 CML0003 27Y CML0003 27C CML0003 27P CML0004 27O CML0004 27Y CML0004 27 CML0004 27P CML0005 27P CML0005 27O CML0005 27C CML0005 27 CML0006 27Y CML0006 27C CML0007 27O CML0007 27Y CML0008 27O CML0009 27P

Below claim numbers should be displayed.

 CML0003 CML0006 CML0007 CML0008 CML0009

Accepted Solutions
Solution
‎02-27-2017 04:30 AM
Super Contributor
Posts: 268

## Re: Urgent Help

By merging the dataset in twice, but with a where clause on the claims you don't want, and by doing a first. on claim_number, this works (as long as '27's will be unique per claim number):

``````data claims;
infile cards dsd dlm='09'x;
attrib claim_number length=\$ 7;
attrib loss_code length=\$ 3;
input claim_number
loss_code;
cards;
CML0001	27
CML0002	27C
CML0002	27
CML0003	27Y
CML0003	27C
CML0003	27P
CML0004	27O
CML0004	27Y
CML0004	27
CML0004	27P
CML0005	27P
CML0005	27O
CML0005	27C
CML0005	27
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
;
run;

data want;
merge claims(keep=claim_number)
claims(in=in_loss where=(loss_code = '27'));
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;``````

Is that all you want?

All Replies
Solution
‎02-27-2017 04:30 AM
Super Contributor
Posts: 268

## Re: Urgent Help

By merging the dataset in twice, but with a where clause on the claims you don't want, and by doing a first. on claim_number, this works (as long as '27's will be unique per claim number):

``````data claims;
infile cards dsd dlm='09'x;
attrib claim_number length=\$ 7;
attrib loss_code length=\$ 3;
input claim_number
loss_code;
cards;
CML0001	27
CML0002	27C
CML0002	27
CML0003	27Y
CML0003	27C
CML0003	27P
CML0004	27O
CML0004	27Y
CML0004	27
CML0004	27P
CML0005	27P
CML0005	27O
CML0005	27C
CML0005	27
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
;
run;

data want;
merge claims(keep=claim_number)
claims(in=in_loss where=(loss_code = '27'));
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;``````

Is that all you want?

Contributor
Posts: 29

## Re: Urgent Help

That's Right on Laurie Thanks . I have duplicate claim numbers with same loss codes but i can perform the proc sort by claim number and loss codes and execute the same program to get the result.

Super Contributor
Posts: 268

## Re: Urgent Help

Technically speaking, you don't have to dedup where you've got multiple '27's in a claim - you'll get a warning about a merge statement with multiple instances of by variables. But in this case it doesn't matter.

If you want to, you can do it in a data step (with or without a view - it's about the same efficiency). I randomly generated 10 million claims and loss codes, and it still ran in under two seconds:

``````data claim_27v / view=claim_27v;
set claims;
by claim_number;
where loss_code = '27';
if first.claim_number;
keep claim_number;
run;

data want;
merge claims(keep=claim_number)
claim_27v(in=in_loss);
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;``````
Super User
Posts: 10,784

## Re: Urgent Help

``````data claims;
infile cards dsd dlm='09'x;
attrib claim_number length=\$ 7;
attrib loss_code length=\$ 3;
input claim_number
loss_code;
cards;
CML0001	27
CML0002	27C
CML0002	27
CML0003	27Y
CML0003	27C
CML0003	27P
CML0004	27O
CML0004	27Y
CML0004	27
CML0004	27P
CML0005	27P
CML0005	27O
CML0005	27C
CML0005	27
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
;
run;
proc sql;
select distinct claim_number
from claims
group by claim_number
having sum(loss_code='27')=0;
quit;``````
Valued Guide
Posts: 505

## Re: Urgent Help

``````If you have a very large table this may be the fastest method.
I think it reuses the cache created by the first 'set'?

data want;
retain gotone 0;
do until (last.claim_number);
set claims;
by claim_number;
if loss_code='27' then gotone=1;
end;
put gotone;
do until (last.claim_number);
set claims;
by claim_number;
if last.claim_number and gotone =0 then output;
end;
gotone=0;
run;quit;

``````
Super Contributor
Posts: 268

## Re: Urgent Help

@rogerjdeangelis I ran the simple merge against your code over 10 million rows - there's nothing in it. Effectively by reusing the dataset in a merge, the cache is reused as well.

You may like to take the put statement out though - I think you might have log problems!

Valued Guide
Posts: 505

## Re: Urgent Help

``````It worked for me, see log below.

The merge and the DOW (two set statements)

I ran 100,000,000. My \$600 workstation is too
fast for 10,000,000.

data claims;
input claim_number \$ loss_code \$;
cards4;
CML0001 27
CML0002 27C
CML0002 27
CML0003 27Y
CML0003 27C
CML0003 27P
CML0004 27O
CML0004 27Y
CML0004 27
CML0004 27P
CML0005 27P
CML0005 27O
CML0005 27C
CML0005 27
CML0006 27Y
CML0006 27C
CML0007 27O
CML0007 27Y
CML0008 27O
CML0009 27P
;;;;
run;

data want(keep=claim_number);
retain gotone 0;
do until (last.claim_number);
set claims;
by claim_number;
if loss_code='27' then gotone=1;
end;
do until (last.claim_number);
set claims;
by claim_number;
if last.claim_number and gotone =0 then output;
end;
gotone=0;
run;quit;

proc print data=want;
run;quit;

456   data claims;
457   input claim_number \$ loss_code \$;
458   cards4;

NOTE: The data set WORK.CLAIMS has 20 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time           0.01 seconds
cpu time            0.00 seconds

479   ;;;;
480   run;
481   data want(keep=claim_number);
482     retain gotone 0;
483     do until (last.claim_number);
484        set claims;
485        by claim_number;
486        if loss_code='27' then gotone=1;
487     end;
488     do until (last.claim_number);
489        set claims;
490        by claim_number;
491        if last.claim_number and gotone =0 then output;
492     end;
493     gotone=0;
494   run;

NOTE: There were 20 observations read from the data set WORK.CLAIMS.
NOTE: There were 20 observations read from the data set WORK.CLAIMS.
NOTE: The data set WORK.WANT has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time           0.00 seconds
cpu time            0.01 seconds

494 !     quit;
495   proc print data=want;
496   run;

NOTE: There were 5 observations read from the data set WORK.WANT.
NOTE: PROCEDURE PRINT used (Total process time):
real time           0.02 seconds
cpu time            0.03 seconds

496 !     quit;

Up to 40 obs from WORK.WANT total obs=5

CLAIM_
Obs    NUMBER

1     CML0003
2     CML0006
3     CML0007
4     CML0008
5     CML0009

Benchmark

data claims100mm;
set claims;
do rec=1 to 5000000;
output;
end;
run;quit;

data want;
merge claims100mm(keep=claim_number)
claims100mm(in=in_loss where=(loss_code = '27'));
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;

542   data want;
543   merge claims100mm(keep=claim_number)
544         claims100mm(in=in_loss where=(loss_code = '27'));
545   by claim_number;
546   if not in_loss;
547   if first.claim_number;
548   keep claim_number;
549   run;

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 100000000 observations read from the data set WORK.CLAIMS100MM.
NOTE: There were 20000000 observations read from the data set WORK.CLAIMS100MM.
WHERE loss_code='27';
NOTE: The data set WORK.WANT has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time           27.63 seconds
cpu time            27.62 seconds

data want(keep=claim_number);
retain gotone 0;
do until (last.claim_number);
set claims100mm;
by claim_number;
if loss_code='27' then gotone=1;
end;
do until (last.claim_number);
set claims100mm;
by claim_number;
if last.claim_number and gotone =0 then output;
end;
gotone=0;
run;

NOTE: There were 100000000 observations read from the data set WORK.CLAIMS100MM.
NOTE: There were 100000000 observations read from the data set WORK.CLAIMS100MM.
NOTE: The data set WORK.WANT has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time           27.08 seconds
cpu time            27.00 seconds

``````
☑ This topic is solved.