Hi all,
I have a data file with account related info. What I'm trying to do is if an account number occurs more than once I want to delete that row. How do I achieve this in the below data step. thanks.
data _null_;
set dataIN;
file "c:\test_file.txt";
if _freq_ > 2 then delete; /* this is my addition to delete the row*/
put
@001 acct_name $50.
@051 address $50.
;
Any help would be appreciated. Thanks.
Is the data sorted by the information that determines a duplicate? If not you are asking for a lot if there are more than a few records.
Easiest would be
Proc Sort data=yourdata out=want nodupkey; by <list of variables that determine uniqueness>; run;
Sorry let me restate my question. If an acct_name occurs more than 1000 times i need to delete all rows for that given account from the file.
yeah the data is sorted by acct_name.
You'll need a step to count occurrences of account numbers. Here's one way:
data _null_;
n_accounts=0;
do until (last.acct_name);
set dataIN;
by acct_name;
n_accounts + 1;
end;
file "C:\test_file.txt";
do until (last.acct_name);
set dataIN;
by acct_name;
if n_accounts <= 1000 then put
@001 acct_name $50.
@051 address $40.
;
end;
run;
You are reading the data twice, but there isn't any way around that.
Hey Astounding,
Thanks for the loop statement that was helpful. Can I take it a step further? Let's say I didn't want to completely remove that account with 1000 rows. If I wanted to implement a WHILE statement to end at a specific point. The below is what I have, but doesn't stop at 500 or less for each account. Thanks.
data _null_;
file "C:\test_file.txt";
n_accounts=0;
do until (last.acct_name);
set dataIN;
by acct_name;
n_accounts + 1;
end;
do while(n_accounts <= 500);
set dataIN;
by acct_name;
put
@001 acct_name $50.
@051 address $40.
;
end;
run;
proc sql;
create table totarecs as
select * from test where group by id having count<1000;
quit;
If data is sorted by acct_name then this will retain acct_names occuring once in test_file.txt
data _null_;
set dataIN;
file "C:\test_file.txt";
by acct_name;
if first.acct_name and last.acct_name;
put
@001 acct_name $50.
@051 address $50.
;
run;
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.