Help using Base SAS procedures

Urgent Help

Accepted Solution Solved
Reply
Contributor
Posts: 24
Accepted Solution

Urgent Help

Hi i have list of claim numbers each with different loss codes like 27,27C,27Y,27O,27P . Now my problem is i watnt to get the claim numbers which have the loss codes except 27. Hope it make sense.

 

Claim NumersLoss codes
CML000127
CML000227C
CML000227
CML000327Y
CML000327C
CML000327P
CML000427O
CML000427Y
CML000427
CML000427P
CML000527P
CML000527O
CML000527C
CML000527
CML000627Y
CML000627C
CML000727O
CML000727Y
CML000827O
CML000927P

 

Below claim numbers should be displayed. 

CML0003
CML0006
CML0007
CML0008
CML0009

 

Thanks in advanceee


Accepted Solutions
Solution
‎02-27-2017 04:30 AM
Super Contributor
Posts: 252

Re: Urgent Help

By merging the dataset in twice, but with a where clause on the claims you don't want, and by doing a first. on claim_number, this works (as long as '27's will be unique per claim number):

data claims;
infile cards dsd dlm='09'x;
attrib claim_number length=$ 7;
attrib loss_code length=$ 3;
input claim_number
      loss_code;
cards;
CML0001	27
CML0002	27C
CML0002	27
CML0003	27Y
CML0003	27C
CML0003	27P
CML0004	27O
CML0004	27Y
CML0004	27
CML0004	27P
CML0005	27P
CML0005	27O
CML0005	27C
CML0005	27
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
;
run;

data want;
merge claims(keep=claim_number)
      claims(in=in_loss where=(loss_code = '27'));
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;

Is that all you want?

View solution in original post


All Replies
Solution
‎02-27-2017 04:30 AM
Super Contributor
Posts: 252

Re: Urgent Help

By merging the dataset in twice, but with a where clause on the claims you don't want, and by doing a first. on claim_number, this works (as long as '27's will be unique per claim number):

data claims;
infile cards dsd dlm='09'x;
attrib claim_number length=$ 7;
attrib loss_code length=$ 3;
input claim_number
      loss_code;
cards;
CML0001	27
CML0002	27C
CML0002	27
CML0003	27Y
CML0003	27C
CML0003	27P
CML0004	27O
CML0004	27Y
CML0004	27
CML0004	27P
CML0005	27P
CML0005	27O
CML0005	27C
CML0005	27
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
;
run;

data want;
merge claims(keep=claim_number)
      claims(in=in_loss where=(loss_code = '27'));
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;

Is that all you want?

Contributor
Posts: 24

Re: Urgent Help

That's Right on Laurie Smiley Happy Thanks . I have duplicate claim numbers with same loss codes but i can perform the proc sort by claim number and loss codes and execute the same program to get the result.

Super Contributor
Posts: 252

Re: Urgent Help

Technically speaking, you don't have to dedup where you've got multiple '27's in a claim - you'll get a warning about a merge statement with multiple instances of by variables. But in this case it doesn't matter.

 

If you want to, you can do it in a data step (with or without a view - it's about the same efficiency). I randomly generated 10 million claims and loss codes, and it still ran in under two seconds:

data claim_27v / view=claim_27v;
set claims;
by claim_number;
where loss_code = '27';
if first.claim_number;
keep claim_number;
run;

data want;
merge claims(keep=claim_number)
      claim_27v(in=in_loss);
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;
Super User
Posts: 10,041

Re: Urgent Help

data claims;
infile cards dsd dlm='09'x;
attrib claim_number length=$ 7;
attrib loss_code length=$ 3;
input claim_number
      loss_code;
cards;
CML0001	27
CML0002	27C
CML0002	27
CML0003	27Y
CML0003	27C
CML0003	27P
CML0004	27O
CML0004	27Y
CML0004	27
CML0004	27P
CML0005	27P
CML0005	27O
CML0005	27C
CML0005	27
CML0006	27Y
CML0006	27C
CML0007	27O
CML0007	27Y
CML0008	27O
CML0009	27P
;
run;
proc sql;
select distinct claim_number
 from claims
  group by claim_number
   having sum(loss_code='27')=0;
quit;
Valued Guide
Posts: 505

Re: Urgent Help

If you have a very large table this may be the fastest method.
I think it reuses the cache created by the first 'set'?

data want;
  retain gotone 0;
  do until (last.claim_number);
     set claims;
     by claim_number;
     if loss_code='27' then gotone=1;
  end;
  put gotone;
  do until (last.claim_number);
     set claims;
     by claim_number;
     if last.claim_number and gotone =0 then output;
  end;
  gotone=0;
run;quit;


Super Contributor
Posts: 252

Re: Urgent Help

Posted in reply to rogerjdeangelis

@rogerjdeangelis I ran the simple merge against your code over 10 million rows - there's nothing in it. Effectively by reusing the dataset in a merge, the cache is reused as well.

 

You may like to take the put statement out though - I think you might have log problems!

Valued Guide
Posts: 505

Re: Urgent Help

It worked for me, see log below.

The merge and the DOW (two set statements)
had identical times.

I ran 100,000,000. My $600 workstation is too
fast for 10,000,000.


data claims;
input claim_number $ loss_code $;
cards4;
CML0001 27
CML0002 27C
CML0002 27
CML0003 27Y
CML0003 27C
CML0003 27P
CML0004 27O
CML0004 27Y
CML0004 27
CML0004 27P
CML0005 27P
CML0005 27O
CML0005 27C
CML0005 27
CML0006 27Y
CML0006 27C
CML0007 27O
CML0007 27Y
CML0008 27O
CML0009 27P
;;;;
run;


data want(keep=claim_number);
  retain gotone 0;
  do until (last.claim_number);
     set claims;
     by claim_number;
     if loss_code='27' then gotone=1;
  end;
  do until (last.claim_number);
     set claims;
     by claim_number;
     if last.claim_number and gotone =0 then output;
  end;
  gotone=0;
run;quit;

proc print data=want;
run;quit;


456   data claims;
457   input claim_number $ loss_code $;
458   cards4;

NOTE: The data set WORK.CLAIMS has 20 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


479   ;;;;
480   run;
481   data want(keep=claim_number);
482     retain gotone 0;
483     do until (last.claim_number);
484        set claims;
485        by claim_number;
486        if loss_code='27' then gotone=1;
487     end;
488     do until (last.claim_number);
489        set claims;
490        by claim_number;
491        if last.claim_number and gotone =0 then output;
492     end;
493     gotone=0;
494   run;

NOTE: There were 20 observations read from the data set WORK.CLAIMS.
NOTE: There were 20 observations read from the data set WORK.CLAIMS.
NOTE: The data set WORK.WANT has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds


494 !     quit;
495   proc print data=want;
496   run;

NOTE: There were 5 observations read from the data set WORK.WANT.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.02 seconds
      cpu time            0.03 seconds

496 !     quit;


Up to 40 obs from WORK.WANT total obs=5

       CLAIM_
Obs    NUMBER

 1     CML0003
 2     CML0006
 3     CML0007
 4     CML0008
 5     CML0009



Benchmark

data claims100mm;
  set claims;
  do rec=1 to 5000000;
    output;
  end;
run;quit;


data want;
merge claims100mm(keep=claim_number)
      claims100mm(in=in_loss where=(loss_code = '27'));
by claim_number;
if not in_loss;
if first.claim_number;
keep claim_number;
run;


542   data want;
543   merge claims100mm(keep=claim_number)
544         claims100mm(in=in_loss where=(loss_code = '27'));
545   by claim_number;
546   if not in_loss;
547   if first.claim_number;
548   keep claim_number;
549   run;

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 100000000 observations read from the data set WORK.CLAIMS100MM.
NOTE: There were 20000000 observations read from the data set WORK.CLAIMS100MM.
      WHERE loss_code='27';
NOTE: The data set WORK.WANT has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           27.63 seconds
      cpu time            27.62 seconds




data want(keep=claim_number);
  retain gotone 0;
  do until (last.claim_number);
     set claims100mm;
     by claim_number;
     if loss_code='27' then gotone=1;
  end;
  do until (last.claim_number);
     set claims100mm;
     by claim_number;
     if last.claim_number and gotone =0 then output;
  end;
  gotone=0;
run;



NOTE: There were 100000000 observations read from the data set WORK.CLAIMS100MM.
NOTE: There were 100000000 observations read from the data set WORK.CLAIMS100MM.
NOTE: The data set WORK.WANT has 5 observations and 1 variables.
NOTE: DATA statement used (Total process time):
      real time           27.08 seconds
      cpu time            27.00 seconds


☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 224 views
  • 2 likes
  • 4 in conversation