Re: deleting rows in data step

sasboy007 · Posted 10-10-2014 01:47 PM

Hi all,

I have a data file with account related info. What I'm trying to do is if an account number occurs more than once I want to delete that row. How do I achieve this in the below data step. thanks.

data _null_;
set dataIN;
file "c:\test_file.txt";

if _freq_ > 2 then delete; /* this is my addition to delete the row*/

put

@001 acct_name $50.

@051 address $50.

;

Any help would be appreciated. Thanks.

ballardw · Posted 10-10-2014 01:57 PM

Is the data sorted by the information that determines a duplicate? If not you are asking for a lot if there are more than a few records.

Easiest would be

Proc Sort data=yourdata out=want nodupkey; by <list of variables that determine uniqueness>; run;

sasboy007 · Posted 10-10-2014 02:05 PM

Sorry let me restate my question. If an acct_name occurs more than 1000 times i need to delete all rows for that given account from the file.

yeah the data is sorted by acct_name.

Astounding · Posted 10-10-2014 02:48 PM

You'll need a step to count occurrences of account numbers. Here's one way:

data _null_;

n_accounts=0;

do until (last.acct_name);

set dataIN;

by acct_name;

n_accounts + 1;

end;

file "C:\test_file.txt";

do until (last.acct_name);

set dataIN;

by acct_name;

if n_accounts <= 1000 then put

@001 acct_name $50.

@051 address $40.

;

end;

run;

You are reading the data twice, but there isn't any way around that.

sasboy007 · Posted 10-13-2014 03:33 PM

Hey Astounding,

Thanks for the loop statement that was helpful. Can I take it a step further? Let's say I didn't want to completely remove that account with 1000 rows. If I wanted to implement a WHILE statement to end at a specific point. The below is what I have, but doesn't stop at 500 or less for each account. Thanks.

data _null_;

file "C:\test_file.txt";

n_accounts=0;

do until (last.acct_name);

set dataIN;

by acct_name;

n_accounts + 1;

end;

do while(n_accounts <= 500);

set dataIN;

by acct_name;

put

@001 acct_name $50.

@051 address $40.

;

end;

run;

LearnByMistk · Posted 10-10-2014 03:08 PM

proc sql;

create table totarecs as

select * from test where group by id having count<1000;

quit;

stat_sas · Posted 10-10-2014 02:40 PM

If data is sorted by acct_name then this will retain acct_names occuring once in test_file.txt

data _null_;

set dataIN;

file "C:\test_file.txt";

by acct_name;

if first.acct_name and last.acct_name;

put

@001 acct_name $50.

@051 address $50.

;

run;

deleting rows in data step