Solved: Urgent Help

Reddi · Posted 02-27-2017 03:38 AM

Hi i have list of claim numbers each with different loss codes like 27,27C,27Y,27O,27P . Now my problem is i watnt to get the claim numbers which have the loss codes except 27. Hope it make sense.

Claim Numers	Loss codes
CML0001	27
CML0002	27C
CML0002	27
CML0003	27Y
CML0003	27C
CML0003	27P
CML0004	27O
CML0004	27Y
CML0004	27
CML0004	27P
CML0005	27P
CML0005	27O
CML0005	27C
CML0005	27
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P

Below claim numbers should be displayed.

CML0003

CML0006

CML0007

CML0008

CML0009

Thanks in advanceee

LaurieF · Posted 02-27-2017 03:56 AM

By merging the dataset in twice, but with a where clause on the claims you don't want, and by doing a first. on claim_number, this works (as long as '27's will be unique per claim number):

data claims;
infile cards dsd dlm='09'x;
attrib claim_number length=$ 7;
attrib loss_code length=$ 3;
input claim_number
      loss_code;
cards;
CML0001	27
CML0002	27C
CML0002	27
CML0003	27Y
CML0003	27C
CML0003	27P
CML0004	27O
CML0004	27Y
CML0004	27
CML0004	27P
CML0005	27P
CML0005	27O
CML0005	27C
CML0005	27
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
;
run;

data want;
merge claims(keep=claim_number)
      claims(in=in_loss where=(loss_code = '27'));
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;

Is that all you want?

View solution in original post

LaurieF · Posted 02-27-2017 03:56 AM

By merging the dataset in twice, but with a where clause on the claims you don't want, and by doing a first. on claim_number, this works (as long as '27's will be unique per claim number):

data claims;
infile cards dsd dlm='09'x;
attrib claim_number length=$ 7;
attrib loss_code length=$ 3;
input claim_number
      loss_code;
cards;
CML0001	27
CML0002	27C
CML0002	27
CML0003	27Y
CML0003	27C
CML0003	27P
CML0004	27O
CML0004	27Y
CML0004	27
CML0004	27P
CML0005	27P
CML0005	27O
CML0005	27C
CML0005	27
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
;
run;

data want;
merge claims(keep=claim_number)
      claims(in=in_loss where=(loss_code = '27'));
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;

Is that all you want?

Reddi · Posted 02-27-2017 04:33 AM

That's Right on Laurie 🙂 Thanks . I have duplicate claim numbers with same loss codes but i can perform the proc sort by claim number and loss codes and execute the same program to get the result.

LaurieF · Posted 02-27-2017 11:58 AM

Technically speaking, you don't have to dedup where you've got multiple '27's in a claim - you'll get a warning about a merge statement with multiple instances of by variables. But in this case it doesn't matter.

If you want to, you can do it in a data step (with or without a view - it's about the same efficiency). I randomly generated 10 million claims and loss codes, and it still ran in under two seconds:

data claim_27v / view=claim_27v;
set claims;
by claim_number;
where loss_code = '27';
if first.claim_number;
keep claim_number;
run;

data want;
merge claims(keep=claim_number)
      claim_27v(in=in_loss);
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;

Ksharp · Posted 02-27-2017 05:44 AM

data claims;
infile cards dsd dlm='09'x;
attrib claim_number length=$ 7;
attrib loss_code length=$ 3;
input claim_number
      loss_code;
cards;
CML0001	27
CML0002	27C
CML0002	27
CML0003	27Y
CML0003	27C
CML0003	27P
CML0004	27O
CML0004	27Y
CML0004	27
CML0004	27P
CML0005	27P
CML0005	27O
CML0005	27C
CML0005	27
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
;
run;
proc sql;
select distinct claim_number
 from claims
  group by claim_number
   having sum(loss_code='27')=0;
quit;

rogerjdeangelis · Posted 02-27-2017 11:17 AM

If you have a very large table this may be the fastest method.
I think it reuses the cache created by the first 'set'?

data want;
  retain gotone 0;
  do until (last.claim_number);
     set claims;
     by claim_number;
     if loss_code='27' then gotone=1;
  end;
  put gotone;
  do until (last.claim_number);
     set claims;
     by claim_number;
     if last.claim_number and gotone =0 then output;
  end;
  gotone=0;
run;quit;

LaurieF · Posted 02-27-2017 12:02 PM

@rogerjdeangelis I ran the simple merge against your code over 10 million rows - there's nothing in it. Effectively by reusing the dataset in a merge, the cache is reused as well.

You may like to take the put statement out though - I think you might have log problems!

rogerjdeangelis · Posted 02-27-2017 12:58 PM

It worked for me, see log below.

The merge and the DOW (two set statements)
had identical times.

I ran 100,000,000. My $600 workstation is too
fast for 10,000,000.


data claims;
input claim_number $ loss_code $;
cards4;
CML0001 27
CML0002 27C
CML0002 27
CML0003 27Y
CML0003 27C
CML0003 27P
CML0004 27O
CML0004 27Y
CML0004 27
CML0004 27P
CML0005 27P
CML0005 27O
CML0005 27C
CML0005 27
CML0006 27Y
CML0006 27C
CML0007 27O
CML0007 27Y
CML0008 27O
CML0009 27P
;;;;
run;


data want(keep=claim_number);
  retain gotone 0;
  do until (last.claim_number);
     set claims;
     by claim_number;
     if loss_code='27' then gotone=1;
  end;
  do until (last.claim_number);
     set claims;
     by claim_number;
     if last.claim_number and gotone =0 then output;
  end;
  gotone=0;
run;quit;

proc print data=want;
run;quit;


456   data claims;
457   input claim_number $ loss_code $;
458   cards4;

NOTE: The data set WORK.CLAIMS has 20 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


479   ;;;;
480   run;
481   data want(keep=claim_number);
482     retain gotone 0;
483     do until (last.claim_number);
484        set claims;
485        by claim_number;
486        if loss_code='27' then gotone=1;
487     end;
488     do until (last.claim_number);
489        set claims;
490        by claim_number;
491        if last.claim_number and gotone =0 then output;
492     end;
493     gotone=0;
494   run;

NOTE: There were 20 observations read from the data set WORK.CLAIMS.
NOTE: There were 20 observations read from the data set WORK.CLAIMS.
NOTE: The data set WORK.WANT has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


494 !     quit;
495   proc print data=want;
496   run;

NOTE: There were 5 observations read from the data set WORK.WANT.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.02 seconds
      cpu time            0.03 seconds

496 !     quit;


Up to 40 obs from WORK.WANT total obs=5

       CLAIM_
Obs    NUMBER

 1     CML0003
 2     CML0006
 3     CML0007
 4     CML0008
 5     CML0009



Benchmark

data claims100mm;
  set claims;
  do rec=1 to 5000000;
    output;
  end;
run;quit;


data want;
merge claims100mm(keep=claim_number)
      claims100mm(in=in_loss where=(loss_code = '27'));
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;


542   data want;
543   merge claims100mm(keep=claim_number)
544         claims100mm(in=in_loss where=(loss_code = '27'));
545   by claim_number;
546   if not in_loss;
547   if first.claim_number;
548   keep claim_number;
549   run;

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 100000000 observations read from the data set WORK.CLAIMS100MM.
NOTE: There were 20000000 observations read from the data set WORK.CLAIMS100MM.
      WHERE loss_code='27';
NOTE: The data set WORK.WANT has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           27.63 seconds
      cpu time            27.62 seconds




data want(keep=claim_number);
  retain gotone 0;
  do until (last.claim_number);
     set claims100mm;
     by claim_number;
     if loss_code='27' then gotone=1;
  end;
  do until (last.claim_number);
     set claims100mm;
     by claim_number;
     if last.claim_number and gotone =0 then output;
  end;
  gotone=0;
run;



NOTE: There were 100000000 observations read from the data set WORK.CLAIMS100MM.
NOTE: There were 100000000 observations read from the data set WORK.CLAIMS100MM.
NOTE: The data set WORK.WANT has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           27.08 seconds
      cpu time            27.00 seconds

Urgent Help

Re: Urgent Help

Re: Urgent Help

Re: Urgent Help

Re: Urgent Help

Re: Urgent Help

Re: Urgent Help

Re: Urgent Help

Re: Urgent Help

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!