BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
avinashns
Obsidian | Level 7

Hi Team,

I have a sas dataset with around 30 variables, Out of which mainly i have concentrate on two variables shown below

POL_ID     Amount

-----------------------------------

AZN123     $25

ABN456     $44

AZP123     $50

ABC456     $20

CRG020     $60

AZC123     $15

ABP456     $10

NYP020     $50

HRG090     $10

Conditions:

1) Eliminate Duplicate records

a)Eliminate records with AZN and retain the record with AZC.

b) Eliminate records with AZP and retain the record with AZC.

c) Eliminate records with ABN and retain the record with ABC

d) Eliminate records with ABP and retain the record with ABC

2) Combine Premium

Combine the Amount from the A_N and A_P records with the A_C record.

Note: S/w: SAS9.3, O/S: Linux

Expected Output(order doesn't matter):

POL_ID     Amount

-------------------------------------

AZC123     $90

ABC456     $74

NY020       $50

CRG020     $60

HRG090     $10

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

Avinash wrote me off line, but I still don't know if I understand what he is trying to accomplish. The following is my best guess for now:

Given the following data:

data have;

  informat pol_id $6.;

  informat amount comma8.;

  input POL_ID     Amount;

  cards;

AZN123     $25

ABN456     $44

AZP123     $50

ABC456     $20

CRG020     $60

AZC123     $15

ABP456     $10

NYP020     $50

HRG090     $10

AZN999     $25

LLL999     $30

;

/*collapse by pol_id*/

proc summary data=have nway;

  var amount;

  class pol_id;

  output out=need (drop=_:)sum=;

run;

/*extract _id and _prefix*/

data need;

  set need;

  _prefix=substr(pol_id,1,3);

  _id=substr(pol_id,4);

run;

/*sort by _id and _prefix*/

proc sort data=need;

  by _id _prefix;

run;

/*change certain pol_ids if first record begins with AZC or ABC*/

data need;

  set need;

  by _id;

  retain _changeit;

  if first._id then do;

    if _prefix in ('AZC','ABC') then _changeit=1;

    else _changeit=0;

  end;

  else if _changeit then do;

    pol_id=tranwrd(pol_id,'AZN','AZC');

    pol_id=tranwrd(pol_id,'AZP','AZC');

    pol_id=tranwrd(pol_id,'ABN','ABC');

    pol_id=tranwrd(pol_id,'ABP','ABC');

  end;

run;

/*Collapse the modified data*/

proc summary data=need nway;

  var amount;

  class pol_id;

  output out=want (drop=_:)sum=;

run;

View solution in original post

8 REPLIES 8
GreggB
Pyrite | Level 9

this will get rid of records with duplicate POL_ID:

proc sort data=mydata nodupkey;

by POL_ID;

run;

CAUTION: this will get rid of records that have the same POL_ID even if the amounts are different.  Is this what  you want to happen?

avinashns
Obsidian | Level 7

Hi Gregg,

No, Thats not i am expecting,

What i am thinking is,

split the POL_ID in to two parts as prifex=first 3 char of POLID , ID=rest of the char (example prefix=AZN, ID=123)

proc sort data=inputdata;

by ID Prefix;

run;

Here AZC,AZN & AZP has same ID which is 123, Now Combine the Amount from the A_N and A_P records with the A_C record. and remove AZP & AZN records from the dataset, So that out of 3 only one records remains i.e AZC123 with total sum.

We can use the  FIRST.VAR and retain logic, Expecting the better approach from you all

Hope it give a clear picture, Let me know if you need any more information..

Reeza
Super User

First.Var/Last.var and retain logic would be the solution I would recommend.

Because you have to split the ID and perform summation other procs will require pre-processing of the data which means multiple data passes. A data step would be a single pass with a pre-sort.

avinashns
Obsidian | Level 7

Hi Team,

How to retain the record for multiple iteration,

For example:

ID     Prefix     Amount     V1      V2     V3     V4 ....... V30

123   ABC      10               q      w       r          t          y

123   ABN       20              a      s       d         f          g

123   ABP       30              z      x       c         v          b

The Required Output:

123    ABC  60     q   w   r    t   y

I need all the variable of first.ID wit Prefix=ABC wit sum of (ABC+ABN+ABP) and delete the  ABN and ABP record

art297
Opal | Level 21

Your directions aren't very clear to me. Looks like you simply want to recode some values and then summarize the data to the pol_id level.

If that is what you are trying to do, then here is one way:

data need;

  set have;

  pol_id=tranwrd(pol_id,'AZN','AZC');

  pol_id=tranwrd(pol_id,'AZP','AZC');

  pol_id=tranwrd(pol_id,'ABN','ABC');

  pol_id=tranwrd(pol_id,'ABP','ABC');

run;

proc summary data=need nway;

  var amount;

  class pol_id;

  output out=want (drop=_:) sum=;

run;

art297
Opal | Level 21

Avinash wrote me off line, but I still don't know if I understand what he is trying to accomplish. The following is my best guess for now:

Given the following data:

data have;

  informat pol_id $6.;

  informat amount comma8.;

  input POL_ID     Amount;

  cards;

AZN123     $25

ABN456     $44

AZP123     $50

ABC456     $20

CRG020     $60

AZC123     $15

ABP456     $10

NYP020     $50

HRG090     $10

AZN999     $25

LLL999     $30

;

/*collapse by pol_id*/

proc summary data=have nway;

  var amount;

  class pol_id;

  output out=need (drop=_:)sum=;

run;

/*extract _id and _prefix*/

data need;

  set need;

  _prefix=substr(pol_id,1,3);

  _id=substr(pol_id,4);

run;

/*sort by _id and _prefix*/

proc sort data=need;

  by _id _prefix;

run;

/*change certain pol_ids if first record begins with AZC or ABC*/

data need;

  set need;

  by _id;

  retain _changeit;

  if first._id then do;

    if _prefix in ('AZC','ABC') then _changeit=1;

    else _changeit=0;

  end;

  else if _changeit then do;

    pol_id=tranwrd(pol_id,'AZN','AZC');

    pol_id=tranwrd(pol_id,'AZP','AZC');

    pol_id=tranwrd(pol_id,'ABN','ABC');

    pol_id=tranwrd(pol_id,'ABP','ABC');

  end;

run;

/*Collapse the modified data*/

proc summary data=need nway;

  var amount;

  class pol_id;

  output out=want (drop=_:)sum=;

run;

avinashns
Obsidian | Level 7

BangON,

The second part with _id and _prefix is the exact solution i was looking for.

Thank you once again. Smiley Happy

avinashns
Obsidian | Level 7

HI Team,

As per Arthur's solutions the below code will fetch what i want

/*Collapse the modified data*/

proc summary data=need nway;

  var amount;

  class pol_id;

  output out=want (drop=_:)sum=;

run;

But the output contains only pol_id and amount variable,

I would like to have all the variable

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2044 views
  • 0 likes
  • 4 in conversation