BookmarkSubscribeRSS Feed
sasboy007
Calcite | Level 5

Hi all,

I have a data file with account related info.  What I'm trying to do is if an account number occurs more than once I want to delete that row.  How do I achieve this in the below data step.  thanks.

data _null_;
set dataIN;
file "c:\test_file.txt";

if  _freq_  >  2 then delete;   /* this is my addition to delete the row*/

put

@001 acct_name  $50.

@051 address $50.

;

Any help would be appreciated.  Thanks.

6 REPLIES 6
ballardw
Super User

Is the data sorted by the information that determines a duplicate? If not you are asking for a lot if there are more than a few records.

Easiest would be

Proc Sort data=yourdata out=want nodupkey; by <list of variables that determine uniqueness>; run;

sasboy007
Calcite | Level 5

Sorry let me restate my question.  If an acct_name occurs more than 1000 times i need to delete all rows for that given account from the file.

yeah the data is sorted by acct_name.

Astounding
PROC Star

You'll need a step to count occurrences of account numbers.  Here's one way:

data _null_;

n_accounts=0;

do until (last.acct_name);

   set dataIN;

   by acct_name;

   n_accounts + 1;

end;

file "C:\test_file.txt";

do until (last.acct_name);

  set dataIN;

  by acct_name;

  if n_accounts <= 1000 then put

  @001 acct_name $50.

  @051 address $40.

  ;

end;

run;

You are reading the data twice, but there isn't any way around that.

sasboy007
Calcite | Level 5

Hey Astounding,

Thanks for the loop statement that was helpful.  Can I take it a step further?  Let's say I didn't want to completely remove that account with 1000 rows.  If I wanted to implement a WHILE statement to end at a specific point.  The below is what I have, but doesn't stop at 500 or less for each account.  Thanks.

data _null_;

file "C:\test_file.txt";

n_accounts=0;

do until (last.acct_name);

   set dataIN;

   by acct_name;

   n_accounts + 1;

end;

do while(n_accounts <= 500);

  set dataIN;

  by acct_name;

put

@001 acct_name $50.

@051 address $40.

  ;

end;

run;

LearnByMistk
Obsidian | Level 7

proc sql;

   create table totarecs as

   select * from test where  group by id having count<1000;

   quit;

stat_sas
Ammonite | Level 13

If data is sorted by acct_name then this will retain acct_names occuring once in test_file.txt

data _null_;

set dataIN;

file "C:\test_file.txt";

by acct_name;

if first.acct_name and last.acct_name;

put

@001 acct_name  $50.

@051 address $50.

;

run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2504 views
  • 3 likes
  • 5 in conversation