Hello all.
Ok I'm returning to SAS after 5 years of not using it and am self training again from scratch.
I've got my basic datastep stuff sorted however some of the proc procedures have completely left my mind.
I've read quite a lot of stuff but am struggling to find what I want and I know its really simple (please don't mock!)
I have a data table that contains various variables
3 of those are (with sample observations)
AuthDate AccountID
13Feb2014 123456
12Feb2014 321565
12Feb2014 123456
12Feb2014 321565
What im trying do is count where the same accountid appears >1 on the same day and show the name so the count from this small output would be 1 (321565 on 12Feb2014)
Appreciate any help.
Stret
data have;
input AuthDate:$9. AccountID:$6. Amount;
datalines;
13Feb2014 123456 20
12Feb2014 321565 50
12Feb2014 123456 10
12Feb2014 321565 60
run;
proc sql;
create table want as
select distinct AuthDate,AccountID from have
group by AuthDate,AccountID
having count(AccountID) >1 and sum(Amount) > 40;
quit;
Specify a table statement, and then add a where clause to your out dataset to keep only records of interest, ie count>1
proc freq data=have;
table authdate*accountID/out=want(where=(count>1)) missing;
run;
Here's an example using a sort and a datastep. The code is written so that it only considers two duplicates per accountid. If there are more than 2 rows of accountids, you'll need to check for first.accountid = 0 and last.accountid = 0. I defaulted count to 0. You might want to default it to 1 until you are certain you have captured all combinations of first and last. For example, if first accont = 0 and last account = 0 and first date = 0 and last date = 0, then you would make count = 0 so that you do not double count when you reach last account and last date.
data start;
infile datalines truncover;
input AuthDate date9. AccountID $;
datalines;
13Feb2014 123456
12Feb2014 321565
12Feb2014 123456
12Feb2014 321565
;
run;
proc sort data=start;
by accountid authdate;
run;
data finish;
set start;
format authdate date9.;
by accountid authdate;
count =0;
if first.accountid = 1 and last.accountid = 1 then count=0;
if first.accountid = 1 and last.accountid = 0 then do;
if first.authdate=1 and last.authdate = 1 then count=0;
end;
if first.accountid = 1 and last.accountid = 0 then do;
if first.authdate=1 and last.authdate = 1 then count=0;
if first.authdate=1 and last.authdate = 0 then count=0;
end;
if first.accountid = 0 and last.accountid = 1 then do;
if first.authdate=1 and last.authdate = 1 then count=0;
if first.authdate=0 and last.authdate = 1 then count=1;
end;
run;
Thank you for your swift responses on this guys.
I'll give them both a whirl later today and report back.
Thank you for your help
Unfortunately I cant get either of those to do what I need. the First proc freq killes my pc!!!
I just need a simple count of where the account id appears on the same day more than once so if there was two on the 1st and and two on the second to the same accountid the count would be 2.
Any other suggestions?
Really do appreciate the help
Stret
data have;
input AuthDate:$9. AccountID:$6.;
datalines;
13Feb2014 123456
12Feb2014 321565
12Feb2014 123456
12Feb2014 321565
run;
proc sql;
create table want as
select * from have
group by AuthDate,AccountID
having count(AccountID) >1;
quit;
I've never used proc SQL before however this does seem to be giving me the results I need on the count so thank you.
Last question.
Where your code says having count(AccountID) >1;
If I had another variable in the table can you put multiple criteria in that so maybe if i wanted where count >1 and total amount >= 50, is that also possible in that statement?
Thanks again
yes depend on what you want and how data is.
so for exmaple the fields are the same as above except I now have an extra variable named "Amount".
I might want to look where the accountID count is >1 on the same day (as above) and the total amount associated with those account ids >=40?
Stret
data have;
input AuthDate:$9. AccountID:$6. Amount;
datalines;
13Feb2014 123456 20
12Feb2014 321565 50
12Feb2014 123456 10
12Feb2014 321565 60
run;
proc sql;
create table want as
select distinct AuthDate,AccountID from have
group by AuthDate,AccountID
having count(AccountID) >1 and sum(Amount) > 40;
quit;
proc sort data=data1;
by authdate acctid;
run;
data new (drop=amt);
retain totalamt ;
set data1;
by authdate acctid ;
if first.authdate & first.acctid then do;
count=1;
totalamt=amt;
end;
if (last.authdate & last.acctid)or (first.authdate & last.acctid) then output;
else do;
count+1;
totalamt+amt;
end;
run;
proc print data=new noobs;
var authdate acctid amt;
where count>1 and totalamt>=40;
run;
Try it with your modifications..
I think Reeza's answer is more efficient !
You could also use PROC SUMMARY (which is really just PROC MEANS with NOPRINT option).
proc summary data=have nway ;
class accountid authdate ;
var amount ;
output n=count sum=total out=want (where=(count > 1)) ;
run;
Note that if you have missing values of the AMOUNT variable then the value generated by the N statistic will be different than the value in the automatic variable _FREQ_ because the missing values will be excluded.
Hi Chaps.
I've been off work for a week so not had chance to have ago at these (or thank you for your help)
I'll have a pop before the end of the week and report back.
Thanks Again
Stret
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.