Hi all,
I have a data file with account related info. What I'm trying to do is if an account number occurs more than once I want to delete that row. How do I achieve this in the below data step. thanks.
data _null_;
set dataIN;
file "c:\test_file.txt";
if _freq_ > 2 then delete; /* this is my addition to delete the row*/
put
@001 acct_name $50.
@051 address $50.
;
Any help would be appreciated. Thanks.
Is the data sorted by the information that determines a duplicate? If not you are asking for a lot if there are more than a few records.
Easiest would be
Proc Sort data=yourdata out=want nodupkey; by <list of variables that determine uniqueness>; run;
Sorry let me restate my question. If an acct_name occurs more than 1000 times i need to delete all rows for that given account from the file.
yeah the data is sorted by acct_name.
You'll need a step to count occurrences of account numbers. Here's one way:
data _null_;
n_accounts=0;
do until (last.acct_name);
set dataIN;
by acct_name;
n_accounts + 1;
end;
file "C:\test_file.txt";
do until (last.acct_name);
set dataIN;
by acct_name;
if n_accounts <= 1000 then put
@001 acct_name $50.
@051 address $40.
;
end;
run;
You are reading the data twice, but there isn't any way around that.
Hey Astounding,
Thanks for the loop statement that was helpful. Can I take it a step further? Let's say I didn't want to completely remove that account with 1000 rows. If I wanted to implement a WHILE statement to end at a specific point. The below is what I have, but doesn't stop at 500 or less for each account. Thanks.
data _null_;
file "C:\test_file.txt";
n_accounts=0;
do until (last.acct_name);
set dataIN;
by acct_name;
n_accounts + 1;
end;
do while(n_accounts <= 500);
set dataIN;
by acct_name;
put
@001 acct_name $50.
@051 address $40.
;
end;
run;
proc sql;
create table totarecs as
select * from test where group by id having count<1000;
quit;
If data is sorted by acct_name then this will retain acct_names occuring once in test_file.txt
data _null_;
set dataIN;
file "C:\test_file.txt";
by acct_name;
if first.acct_name and last.acct_name;
put
@001 acct_name $50.
@051 address $50.
;
run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.