BookmarkSubscribeRSS Feed
podarum
Quartz | Level 8

Hi,   I tried few procedures btu can't quite get it.

I have a bunch of dup and I would like to delete the dup that is NOT in one of the list in another dataset or specified in the proc.

Example:

Have:

PS   CIT  TWN

L1T  Tor  AJX

L1T  Tor  OTT

L1T  Tor  SHE

M1P Tor  YYZ

M1P Tor  ZHE

I want to keep the dup that TWN is in a list of (AJX, YYZ).. the rest delete. (the dups must be of PS and CIT obviously)

So want:

PS  CIT  TWN

L1T  Tor  AJX

M1P Tor  YYZ

Thanks

12 REPLIES 12
Ksharp
Super User

It would be helpful that you can post some more sample data.

data have;
input (PS   CIT  TWN) ($);
cards;
L1T  Tor  AJX
L1T  Tor  OTT
L1T  Tor  SHE
M1P Tor  YYZ
M1P Tor  ZHE
;
run;
proc sql;
 create table want as
  select *
   from have
    group by ps,cit
      having count(*) ge 2 and twn in ('AJX', 'YYZ');
quit;


Ksharp

podarum
Quartz | Level 8

I get this Note and 0 observatiosn show up:

NOTE: The query requires remerging summary statistics back with the original data.

art297
Opal | Level 21

The note is normal and not indicative of any problem.  That you get 0 observations, of course, is a significant problem.  The data look like city names, forward postal code sorting areas, and airport codes.  Are they case sensitive or might some records have all lower or upper case or some other feature that would make it so that expected matches won't be found?  You'll have to provide some sample data for which you get the results you mentioned.

podarum
Quartz | Level 8

The code worked well, but it also deletes any indipendedn observations that I would want to keep  eg. The M1G  Tor  YYZ  or  L2E Osh YYZ

Have:

PS   CIT  TWN

L1T  Tor  AJX

L1T  Tor  OTT

L1T  Tor  SHE

M1P Tor  YYZ

M1P Tor  ZHE

M1G Tor  YYZ

L2E  Osh  YYZ

art297
Opal | Level 21

Then I'm confused as that it what the code I originally proposed does.  I.e.,

data have;

input (PS   CIT  TWN) ($);

  cards;

L1T  Tor  AJX

L1T  Tor  OTT

L1T  Tor  SHE

M1P Tor  YYZ

M1P Tor  ZHE

M1G Tor  YYZ

L2E  Osh  YYZ

;

proc sort data=have out=want;

   by ps cit twn;

run;

proc sort data=want nodupkey;

  by ps cit;

run;

results in a file with 4 records, namely:

ps   cit   twn

L1T  Tor  AJX

L2E Osh YYZ

M1G Tor  YYZ

M1P Tor  YYZ

podarum
Quartz | Level 8

You are right if the TWN is always the first alphabetical one to use, but this is just a small example I used to illustarte.  But I found a solution..

data want;

set have;

by PS;

retain PS;

if first.PS =last.PS then use 'Y'; else if CIT in ('AJX', 'YYZ') then use = 'Y' else use = 'N'

run;

art297
Opal | Level 21

That wouldn't work unless you mistyped it it meant to type:

if first.PS =last.PS then use='Y';

however, that wouldn't work as you expect.  Given 3 duplicate records, then 2nd record would get accepted as both first.ps and last.ps would equal 0, but would be accepted as "use" since it would meet the condition.

and the rest of the line is missing two more semicolons, none of your cit fields have those values, and the file would still have to be preceded by the correct proc sort.

If it is sorted correctly, then the following would have the same effect as the code I suggested:

data want;

  set have;

  by PS;

  if not((first.PS and last.PS) or twn in ('AJX', 'YYZ'))

   then delete;

run;

podarum
Quartz | Level 8

Yes I meant to type

if first.PS =last.PS then use='Y';  But this is meant only for the non-duplicate records (or individual records), the dups would fall under else if CIT in ('AJX', 'YYZ') then use = 'Y' else use = 'N' . I also mistyped the 2 semi-colons.  You are also right, your code gave me the same results.

art297
Opal | Level 21

What is your rule for deleting duplicates?  It it is first alphabetical, as in your example, then first sort, then sort with nodupkey.  E.g.,

proc sort data=have out=want;

   by ps cit twn;

run;

proc sort data=want nodupkey;

  by ps cit;

run;

podarum
Quartz | Level 8

art my rule is if it's not in the list of selected TWN then it must be deleted

art297
Opal | Level 21

Then I vote for Ksharp's suggested code.

Ksharp
Super User

If you also need to keep the group which only contain one obs. Why not directly use WHERE statement?

data want;

set have;

where twn in ('AJX', 'YYZ');

run;

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 821 views
  • 0 likes
  • 3 in conversation