BookmarkSubscribeRSS Feed
sasboy007
Calcite | Level 5

Hi all,

I have a data file with account related info.  What I'm trying to do is if an account number occurs more than once I want to delete that row.  How do I achieve this in the below data step.  thanks.

data _null_;
set dataIN;
file "c:\test_file.txt";

if  _freq_  >  2 then delete;   /* this is my addition to delete the row*/

put

@001 acct_name  $50.

@051 address $50.

;

Any help would be appreciated.  Thanks.

6 REPLIES 6
ballardw
Super User

Is the data sorted by the information that determines a duplicate? If not you are asking for a lot if there are more than a few records.

Easiest would be

Proc Sort data=yourdata out=want nodupkey; by <list of variables that determine uniqueness>; run;

sasboy007
Calcite | Level 5

Sorry let me restate my question.  If an acct_name occurs more than 1000 times i need to delete all rows for that given account from the file.

yeah the data is sorted by acct_name.

Astounding
PROC Star

You'll need a step to count occurrences of account numbers.  Here's one way:

data _null_;

n_accounts=0;

do until (last.acct_name);

   set dataIN;

   by acct_name;

   n_accounts + 1;

end;

file "C:\test_file.txt";

do until (last.acct_name);

  set dataIN;

  by acct_name;

  if n_accounts <= 1000 then put

  @001 acct_name $50.

  @051 address $40.

  ;

end;

run;

You are reading the data twice, but there isn't any way around that.

sasboy007
Calcite | Level 5

Hey Astounding,

Thanks for the loop statement that was helpful.  Can I take it a step further?  Let's say I didn't want to completely remove that account with 1000 rows.  If I wanted to implement a WHILE statement to end at a specific point.  The below is what I have, but doesn't stop at 500 or less for each account.  Thanks.

data _null_;

file "C:\test_file.txt";

n_accounts=0;

do until (last.acct_name);

   set dataIN;

   by acct_name;

   n_accounts + 1;

end;

do while(n_accounts <= 500);

  set dataIN;

  by acct_name;

put

@001 acct_name $50.

@051 address $40.

  ;

end;

run;

LearnByMistk
Obsidian | Level 7

proc sql;

   create table totarecs as

   select * from test where  group by id having count<1000;

   quit;

stat_sas
Ammonite | Level 13

If data is sorted by acct_name then this will retain acct_names occuring once in test_file.txt

data _null_;

set dataIN;

file "C:\test_file.txt";

by acct_name;

if first.acct_name and last.acct_name;

put

@001 acct_name  $50.

@051 address $50.

;

run;

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 1375 views
  • 3 likes
  • 5 in conversation