Help using Base SAS procedures

Removing duplicate pairs i.e keeping only unique values that weren't part of a pair

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Removing duplicate pairs i.e keeping only unique values that weren't part of a pair

Hello all,

I managed to resolve my first (harder) query, I'm now stuck which this one, which should be easy!

I have a list which contains ~50 variables per observation. Some observations (which I'm usinag as the ID) are duplicates, sorted by observation/ID.

I'm wanting to delete all pairs (i.e both entires) leaving me with only the observations which only appear once i.e were not one of a pair.

I've scoured the web and found nothing! But loads about removing duplicates to leave one of the duplicate pair, which isn't what I'm after!

Any help is truly appreciated, it should be so easy I'll kick myself.

Many thanks,


Accepted Solutions
Solution
‎11-14-2013 08:25 AM
Occasional Contributor
Posts: 13

Re: Removing duplicate pairs i.e keeping only unique values that weren't part of a pair

proc sql;

select *

from table

where ID_columns not in (

     select

     ID_columns

     from (

          select

          ID_columns

          , count(*) as records

          from table

          group by ID_columns

     )

     where records > 1

);

quit;run;

View solution in original post


All Replies
Solution
‎11-14-2013 08:25 AM
Occasional Contributor
Posts: 13

Re: Removing duplicate pairs i.e keeping only unique values that weren't part of a pair

proc sql;

select *

from table

where ID_columns not in (

     select

     ID_columns

     from (

          select

          ID_columns

          , count(*) as records

          from table

          group by ID_columns

     )

     where records > 1

);

quit;run;

New Contributor
Posts: 4

Re: Removing duplicate pairs i.e keeping only unique values that weren't part of a pair

Thanks,

Could I be a fool and ask you to spell it out, I'm having a try but getting a few errors (cannot find work.data.table etc)

Thanks!

Occasional Contributor
Posts: 13

Re: Removing duplicate pairs i.e keeping only unique values that weren't part of a pair

Replace ID_columns with the columns that make up your id you mentioned.

Replace table with the dataset you are using (eg. work.some_ds or whatever).

I don't see any specific details about the dataset you are using in this post.

N/A
Posts: 1

Re: Removing duplicate pairs i.e keeping only unique values that weren't part of a pair

/* maybe a sample dataset in the example would help */

data shoes;

set sashelp.shoes;id_columns = _n_;

run;

data shoes1; set sashelp.shoes(obs=11) ;

id_columns = _n_;run;

data shoes1;

set shoes shoes1;

run;

proc sql;

create table shoes2 as select *

from shoes1 where ID_columns not in (     select     ID_columns     from (           select           ID_columns           , count(*) as records         

from shoes1         

group by ID_columns     )   

where records > 1 );

quit;

run;

data ck; set shoes2;

run;

Occasional Contributor
Posts: 13

Re: Removing duplicate pairs i.e keeping only unique values that weren't part of a pair

How about this? A little cleanup.

/* maybe a sample dataset in the example would help */

data shoes;

set sashelp.shoes;id_columns = _n_;

run;

data shoes1; set sashelp.shoes(obs=11) ;

id_columns = _n_;run;

data shoes1;

set shoes shoes1;

run;

proc sql;

    

  select          

  region

  , product

  , subsidiary

  , max(ID_columns) as id_columns

  from shoes1        

  group by

  region

  , product

  , subsidiary

;

quit;

proc sql;

create table shoes2 as select *

from shoes1

where id_columns in (

  select          

  max(ID_columns) as id_columns

  from shoes1        

  group by

  region

  , product

  , subsidiary

);

quit;

run;

data ck; set shoes2;

run;

Super Contributor
Posts: 339

Re: Removing duplicate pairs i.e keeping only unique values that weren't part of a pair

I struggle to understand how your dataset is built esp with the naming of ID as Observation but anyway from what I get

you have a dataset, let's call it "have" as by the standards around the forums. It is already sorted by a variable called ID and you have duplicate IDs. You wish to subset to have only records that have no duplicates? If so, and again, this relies on the existing sortation, you can use by processing to trivialize the job.

data want;

     set have;

     by ID;

     if first.id and last.id then output;

run;

The above outputs a record to the dataset WANT only if it is simultaneously the first and the last value of a by-group effectively only outputting IDs that have no duplicates. It is also significantly faster than query on a subquery with count() from SQL if you are not running on a DBMS

Vincent

Respected Advisor
Posts: 3,124

Re: Removing duplicate pairs i.e keeping only unique values that weren't part of a pair

Just for the sake of SQL, subquery is not needed, but I am not sure how that affects performance:

proc sql;

  create table want as

    select * from have

   group by id

   having count(*) <2;

quit;

Haikuo

Occasional Contributor
Posts: 13

Re: Removing duplicate pairs i.e keeping only unique values that weren't part of a pair

This won't work. You will be completely removing observations with more than 1 record. You want to keep one observation.

By the way, you could also use proc sort and dedupkey.

Respected Advisor
Posts: 3,124

Re: Removing duplicate pairs i.e keeping only unique values that weren't part of a pair

Not sure about OP's real intention, but this is the quote from the initial post, maybe the course has been shifted down the line?

"I'm wanting to delete all pairs (i.e both entires) leaving me with only the observations which only appear once i.e were not one of a pair."

You are right, it will remove both, because that is exactly what OP desires, or at least appeared to be so, judging from the above statement. Well, at the same time, I agree this may NOT be a case that is 'beyond reasonable doubt", per se Smiley Happy.

Haikuo

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 1267 views
  • 2 likes
  • 5 in conversation