BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Barkamih
Pyrite | Level 9

Hi guys 

I'm looking for code that will delete any number repeated three times or less of this column (COW_ID), and my dataset name is (work.mfd). I have over one million observations in this column. 

COW_ID

62103918

62103918

62103918

62103918

62105823

62105823

62105823

62105952

62105952

62105952

62105952

62105952

62109009

62115446

62115446

62132148

62132148

62132148

62132148

62132148

62132148

62132148

 

my regards 

 

Ibrahim 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

A simple SQL query should do that.

 

create table want as
  select * 
  from have 
  group by cow_id
  having count(*) > 3 
;

If the data is sorted you could probably get a quicker result using a data step with a double DOW loop. First loop to count and second to control which records are output.

data want ;
  do _N_=1 by 1 until (last.cowid);
    set have ;
    by cowid;
  end;
  do until (last.cowid);
    set have ;
    by cowid;
    if _N_ > 3 then output;
  end;
run;

 

View solution in original post

4 REPLIES 4
Dusan_C
Calcite | Level 5

Use adaptation of this code to mark all those records with Counting Duplicate Rows in a Table, and then delete them with DELETE

I am assuming that by 

delete any number repeated three times or less of this column (COW_ID)

you mean delete all records that repeat 3 times or less.
If you formulate it like that then that means that only records that appear 4x or higher will remain.
Nevertheless, you can change condition for deletion however you want in the DELETE statement 😉

Astounding
PROC Star

Are you trying to remove all instances of these ID values?

 

62105823

62109009

62115446

Barkamih
Pyrite | Level 9

Thanks   for your reply

yes ,that is what I need exactly

Any ideas please?

Tom
Super User Tom
Super User

A simple SQL query should do that.

 

create table want as
  select * 
  from have 
  group by cow_id
  having count(*) > 3 
;

If the data is sorted you could probably get a quicker result using a data step with a double DOW loop. First loop to count and second to control which records are output.

data want ;
  do _N_=1 by 1 until (last.cowid);
    set have ;
    by cowid;
  end;
  do until (last.cowid);
    set have ;
    by cowid;
    if _N_ > 3 then output;
  end;
run;

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 917 views
  • 2 likes
  • 4 in conversation