BookmarkSubscribeRSS Feed
sasboy007
Calcite | Level 5

Hi all,

I have a data file with account related info.  What I'm trying to do is if an account number occurs more than once I want to delete that row.  How do I achieve this in the below data step.  thanks.

data _null_;
set dataIN;
file "c:\test_file.txt";

if  _freq_  >  2 then delete;   /* this is my addition to delete the row*/

put

@001 acct_name  $50.

@051 address $50.

;

Any help would be appreciated.  Thanks.

6 REPLIES 6
ballardw
Super User

Is the data sorted by the information that determines a duplicate? If not you are asking for a lot if there are more than a few records.

Easiest would be

Proc Sort data=yourdata out=want nodupkey; by <list of variables that determine uniqueness>; run;

sasboy007
Calcite | Level 5

Sorry let me restate my question.  If an acct_name occurs more than 1000 times i need to delete all rows for that given account from the file.

yeah the data is sorted by acct_name.

Astounding
PROC Star

You'll need a step to count occurrences of account numbers.  Here's one way:

data _null_;

n_accounts=0;

do until (last.acct_name);

   set dataIN;

   by acct_name;

   n_accounts + 1;

end;

file "C:\test_file.txt";

do until (last.acct_name);

  set dataIN;

  by acct_name;

  if n_accounts <= 1000 then put

  @001 acct_name $50.

  @051 address $40.

  ;

end;

run;

You are reading the data twice, but there isn't any way around that.

sasboy007
Calcite | Level 5

Hey Astounding,

Thanks for the loop statement that was helpful.  Can I take it a step further?  Let's say I didn't want to completely remove that account with 1000 rows.  If I wanted to implement a WHILE statement to end at a specific point.  The below is what I have, but doesn't stop at 500 or less for each account.  Thanks.

data _null_;

file "C:\test_file.txt";

n_accounts=0;

do until (last.acct_name);

   set dataIN;

   by acct_name;

   n_accounts + 1;

end;

do while(n_accounts <= 500);

  set dataIN;

  by acct_name;

put

@001 acct_name $50.

@051 address $40.

  ;

end;

run;

LearnByMistk
Obsidian | Level 7

proc sql;

   create table totarecs as

   select * from test where  group by id having count<1000;

   quit;

stat_sas
Ammonite | Level 13

If data is sorted by acct_name then this will retain acct_names occuring once in test_file.txt

data _null_;

set dataIN;

file "C:\test_file.txt";

by acct_name;

if first.acct_name and last.acct_name;

put

@001 acct_name  $50.

@051 address $50.

;

run;

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 2857 views
  • 3 likes
  • 5 in conversation