Solved: Code for deleting repeated numbers

Barkamih · Posted 10-08-2017 09:24 AM

Hi guys

I'm looking for code that will delete any number repeated three times or less of this column (COW_ID), and my dataset name is (work.mfd). I have over one million observations in this column.

COW_ID

62103918

62105823

62105952

62109009

62115446

62132148

my regards

Ibrahim

Tom · Posted 10-08-2017 11:25 AM

A simple SQL query should do that.

create table want as
  select * 
  from have 
  group by cow_id
  having count(*) > 3 
;

If the data is sorted you could probably get a quicker result using a data step with a double DOW loop. First loop to count and second to control which records are output.

data want ;
  do _N_=1 by 1 until (last.cowid);
    set have ;
    by cowid;
  end;
  do until (last.cowid);
    set have ;
    by cowid;
    if _N_ > 3 then output;
  end;
run;

View solution in original post

Dusan_C · Posted 10-08-2017 10:08 AM

Use adaptation of this code to mark all those records with Counting Duplicate Rows in a Table, and then delete them with DELETE

I am assuming that by

delete any number repeated three times or less of this column (COW_ID)

you mean delete all records that repeat 3 times or less.
If you formulate it like that then that means that only records that appear 4x or higher will remain.
Nevertheless, you can change condition for deletion however you want in the DELETE statement 😉

Astounding · Posted 10-08-2017 10:09 AM

Are you trying to remove all instances of these ID values?

62105823

62109009

62115446

Astounding · Posted 10-08-2017 10:42 AM

Thanks Astounding for your reply

yes ,that is what I need exactly

Any ideas please?

Tom · Posted 10-08-2017 11:25 AM

A simple SQL query should do that.

create table want as
  select * 
  from have 
  group by cow_id
  having count(*) > 3 
;

If the data is sorted you could probably get a quicker result using a data step with a double DOW loop. First loop to count and second to control which records are output.

data want ;
  do _N_=1 by 1 until (last.cowid);
    set have ;
    by cowid;
  end;
  do until (last.cowid);
    set have ;
    by cowid;
    if _N_ > 3 then output;
  end;
run;

Code for deleting repeated numbers

Re: Code for deleting repeated numbers

Re: Code for deleting repeated numbers

Re: Code for deleting repeated numbers

Re: Code for deleting repeated numbers

Re: Code for deleting repeated numbers

Code for deleting repeated numbers

Re: Code for deleting repeated numbers

Re: Code for deleting repeated numbers

Re: Code for deleting repeated numbers

Re: Code for deleting repeated numbers

Re: Code for deleting repeated numbers

Registration is open

SAS Training: Just a Click Away