Help using Base SAS procedures

delete the dup that is in a group

Reply
Super Contributor
Posts: 401

delete the dup that is in a group

Hi,   I tried few procedures btu can't quite get it.

I have a bunch of dup and I would like to delete the dup that is NOT in one of the list in another dataset or specified in the proc.

Example:

Have:

PS   CIT  TWN

L1T  Tor  AJX

L1T  Tor  OTT

L1T  Tor  SHE

M1P Tor  YYZ

M1P Tor  ZHE

I want to keep the dup that TWN is in a list of (AJX, YYZ).. the rest delete. (the dups must be of PS and CIT obviously)

So want:

PS  CIT  TWN

L1T  Tor  AJX

M1P Tor  YYZ

Thanks

Super User
Posts: 10,035

delete the dup that is in a group

It would be helpful that you can post some more sample data.

data have;
input (PS   CIT  TWN) ($);
cards;
L1T  Tor  AJX
L1T  Tor  OTT
L1T  Tor  SHE
M1P Tor  YYZ
M1P Tor  ZHE
;
run;
proc sql;
 create table want as
  select *
   from have
    group by ps,cit
      having count(*) ge 2 and twn in ('AJX', 'YYZ');
quit;


Ksharp

Super Contributor
Posts: 401

delete the dup that is in a group

I get this Note and 0 observatiosn show up:

NOTE: The query requires remerging summary statistics back with the original data.

PROC Star
Posts: 7,474

delete the dup that is in a group

The note is normal and not indicative of any problem.  That you get 0 observations, of course, is a significant problem.  The data look like city names, forward postal code sorting areas, and airport codes.  Are they case sensitive or might some records have all lower or upper case or some other feature that would make it so that expected matches won't be found?  You'll have to provide some sample data for which you get the results you mentioned.

Super Contributor
Posts: 401

delete the dup that is in a group

The code worked well, but it also deletes any indipendedn observations that I would want to keep  eg. The M1G  Tor  YYZ  or  L2E Osh YYZ

Have:

PS   CIT  TWN

L1T  Tor  AJX

L1T  Tor  OTT

L1T  Tor  SHE

M1P Tor  YYZ

M1P Tor  ZHE

M1G Tor  YYZ

L2E  Osh  YYZ

PROC Star
Posts: 7,474

Re: delete the dup that is in a group

Then I'm confused as that it what the code I originally proposed does.  I.e.,

data have;

input (PS   CIT  TWN) ($);

  cards;

L1T  Tor  AJX

L1T  Tor  OTT

L1T  Tor  SHE

M1P Tor  YYZ

M1P Tor  ZHE

M1G Tor  YYZ

L2E  Osh  YYZ

;

proc sort data=have out=want;

   by ps cit twn;

run;

proc sort data=want nodupkey;

  by ps cit;

run;

results in a file with 4 records, namely:

ps   cit   twn

L1T  Tor  AJX

L2E Osh YYZ

M1G Tor  YYZ

M1P Tor  YYZ

Super Contributor
Posts: 401

delete the dup that is in a group

You are right if the TWN is always the first alphabetical one to use, but this is just a small example I used to illustarte.  But I found a solution..

data want;

set have;

by PS;

retain PS;

if first.PS =last.PS then use 'Y'; else if CIT in ('AJX', 'YYZ') then use = 'Y' else use = 'N'

run;

PROC Star
Posts: 7,474

Re: delete the dup that is in a group

That wouldn't work unless you mistyped it it meant to type:

if first.PS =last.PS then use='Y';

however, that wouldn't work as you expect.  Given 3 duplicate records, then 2nd record would get accepted as both first.ps and last.ps would equal 0, but would be accepted as "use" since it would meet the condition.

and the rest of the line is missing two more semicolons, none of your cit fields have those values, and the file would still have to be preceded by the correct proc sort.

If it is sorted correctly, then the following would have the same effect as the code I suggested:

data want;

  set have;

  by PS;

  if not((first.PS and last.PS) or twn in ('AJX', 'YYZ'))

   then delete;

run;

Super Contributor
Posts: 401

Re: delete the dup that is in a group

Yes I meant to type

if first.PS =last.PS then use='Y';  But this is meant only for the non-duplicate records (or individual records), the dups would fall under else if CIT in ('AJX', 'YYZ') then use = 'Y' else use = 'N' . I also mistyped the 2 semi-colons.  You are also right, your code gave me the same results.

PROC Star
Posts: 7,474

Re: delete the dup that is in a group

What is your rule for deleting duplicates?  It it is first alphabetical, as in your example, then first sort, then sort with nodupkey.  E.g.,

proc sort data=have out=want;

   by ps cit twn;

run;

proc sort data=want nodupkey;

  by ps cit;

run;

Super Contributor
Posts: 401

delete the dup that is in a group

art my rule is if it's not in the list of selected TWN then it must be deleted

PROC Star
Posts: 7,474

delete the dup that is in a group

Then I vote for Ksharp's suggested code.

Super User
Posts: 10,035

Re: delete the dup that is in a group

If you also need to keep the group which only contain one obs. Why not directly use WHERE statement?

data want;

set have;

where twn in ('AJX', 'YYZ');

run;

Ksharp

Ask a Question
Discussion stats
  • 12 replies
  • 138 views
  • 0 likes
  • 3 in conversation