Hi all,
I have a data file with account related info. What I'm trying to do is if an account number occurs more than once I want to delete that row. How do I achieve this in the below data step. thanks.
data _null_;
set dataIN;
file "c:\test_file.txt";
if _freq_ > 2 then delete; /* this is my addition to delete the row*/
put
@001 acct_name $50.
@051 address $50.
;
Any help would be appreciated. Thanks.
Is the data sorted by the information that determines a duplicate? If not you are asking for a lot if there are more than a few records.
Easiest would be
Proc Sort data=yourdata out=want nodupkey; by <list of variables that determine uniqueness>; run;
Sorry let me restate my question. If an acct_name occurs more than 1000 times i need to delete all rows for that given account from the file.
yeah the data is sorted by acct_name.
You'll need a step to count occurrences of account numbers. Here's one way:
data _null_;
n_accounts=0;
do until (last.acct_name);
set dataIN;
by acct_name;
n_accounts + 1;
end;
file "C:\test_file.txt";
do until (last.acct_name);
set dataIN;
by acct_name;
if n_accounts <= 1000 then put
@001 acct_name $50.
@051 address $40.
;
end;
run;
You are reading the data twice, but there isn't any way around that.
Hey Astounding,
Thanks for the loop statement that was helpful. Can I take it a step further? Let's say I didn't want to completely remove that account with 1000 rows. If I wanted to implement a WHILE statement to end at a specific point. The below is what I have, but doesn't stop at 500 or less for each account. Thanks.
data _null_;
file "C:\test_file.txt";
n_accounts=0;
do until (last.acct_name);
set dataIN;
by acct_name;
n_accounts + 1;
end;
do while(n_accounts <= 500);
set dataIN;
by acct_name;
put
@001 acct_name $50.
@051 address $40.
;
end;
run;
proc sql;
create table totarecs as
select * from test where group by id having count<1000;
quit;
If data is sorted by acct_name then this will retain acct_names occuring once in test_file.txt
data _null_;
set dataIN;
file "C:\test_file.txt";
by acct_name;
if first.acct_name and last.acct_name;
put
@001 acct_name $50.
@051 address $50.
;
run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.