BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Barkamih
Pyrite | Level 9

Hi guys 

I'm looking for code that will delete any number repeated three times or less of this column (COW_ID), and my dataset name is (work.mfd). I have over one million observations in this column. 

COW_ID

62103918

62103918

62103918

62103918

62105823

62105823

62105823

62105952

62105952

62105952

62105952

62105952

62109009

62115446

62115446

62132148

62132148

62132148

62132148

62132148

62132148

62132148

 

my regards 

 

Ibrahim 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

A simple SQL query should do that.

 

create table want as
  select * 
  from have 
  group by cow_id
  having count(*) > 3 
;

If the data is sorted you could probably get a quicker result using a data step with a double DOW loop. First loop to count and second to control which records are output.

data want ;
  do _N_=1 by 1 until (last.cowid);
    set have ;
    by cowid;
  end;
  do until (last.cowid);
    set have ;
    by cowid;
    if _N_ > 3 then output;
  end;
run;

 

View solution in original post

4 REPLIES 4
Dusan_C
Calcite | Level 5

Use adaptation of this code to mark all those records with Counting Duplicate Rows in a Table, and then delete them with DELETE

I am assuming that by 

delete any number repeated three times or less of this column (COW_ID)

you mean delete all records that repeat 3 times or less.
If you formulate it like that then that means that only records that appear 4x or higher will remain.
Nevertheless, you can change condition for deletion however you want in the DELETE statement 😉

Astounding
PROC Star

Are you trying to remove all instances of these ID values?

 

62105823

62109009

62115446

Barkamih
Pyrite | Level 9

Thanks   for your reply

yes ,that is what I need exactly

Any ideas please?

Tom
Super User Tom
Super User

A simple SQL query should do that.

 

create table want as
  select * 
  from have 
  group by cow_id
  having count(*) > 3 
;

If the data is sorted you could probably get a quicker result using a data step with a double DOW loop. First loop to count and second to control which records are output.

data want ;
  do _N_=1 by 1 until (last.cowid);
    set have ;
    by cowid;
  end;
  do until (last.cowid);
    set have ;
    by cowid;
    if _N_ > 3 then output;
  end;
run;

 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 904 views
  • 2 likes
  • 4 in conversation