BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Stretlow
Obsidian | Level 7

Hello all.

Ok I'm returning to SAS after 5 years of not using it and am self training again from scratch.

I've got my basic datastep stuff sorted however some of the proc procedures have completely left my mind.

I've read quite a lot of stuff but am struggling to find what I want and I know its really simple (please don't mock!)

I have a data table that contains various variables

3 of those are (with sample observations)


AuthDate    AccountID  

13Feb2014  123456      

12Feb2014  321565     

12Feb2014  123456      

12Feb2014  321565     


What im trying do is count where the same accountid appears >1 on the same day and show the name so the count from this small output would be 1 (321565 on 12Feb2014)

Appreciate any help.

Stret

1 ACCEPTED SOLUTION

Accepted Solutions
TarunKumar
Pyrite | Level 9

data have;

input AuthDate:$9.    AccountID:$6. Amount;

datalines;

13Feb2014  123456 20    

12Feb2014  321565   50 

12Feb2014  123456    10 

12Feb2014  321565   60

run;

proc sql;

create table want as

select distinct AuthDate,AccountID from have

group by AuthDate,AccountID

having count(AccountID) >1 and sum(Amount) > 40;

quit;

View solution in original post

13 REPLIES 13
Reeza
Super User

Specify a table statement, and then add a where clause to your out dataset to keep only records of interest, ie count>1

proc freq data=have;

table authdate*accountID/out=want(where=(count>1)) missing;

run;

jwillis
Quartz | Level 8

Here's an example using a sort and a datastep.  The code is written so that it only considers two duplicates per accountid. If there are more than 2 rows of accountids, you'll need to check for first.accountid = 0 and last.accountid = 0.  I defaulted count to 0.  You might want to default it to 1 until you are certain you have captured all combinations of first and last.  For example, if first accont = 0 and last account = 0 and first date = 0 and last date  = 0, then you would make count = 0 so that you do not double count when you reach last account and last date.

data start;

infile datalines truncover;

  input AuthDate date9. AccountID $;

datalines;

13Feb2014 123456      

12Feb2014 321565     

12Feb2014 123456      

12Feb2014 321565

;

run;


proc sort data=start;
by accountid authdate;
run;

data finish;
   set start;
   format authdate date9.;
   by accountid authdate;
   count =0;
   if first.accountid = 1 and last.accountid = 1 then count=0;
   if first.accountid = 1 and last.accountid = 0 then do;
      if first.authdate=1 and last.authdate  = 1 then count=0;
   end;
   if first.accountid = 1 and last.accountid = 0 then do;
      if first.authdate=1 and last.authdate  = 1 then count=0;
      if first.authdate=1 and last.authdate  = 0 then count=0;
   end;
   if first.accountid = 0 and last.accountid = 1 then do;
      if first.authdate=1 and last.authdate  = 1 then count=0;
      if first.authdate=0 and last.authdate  = 1 then count=1;
   end;

run;

Stretlow
Obsidian | Level 7

Thank you for your swift responses on this guys.

I'll give them both a whirl later today and report back.

Thank you for your help

Stretlow
Obsidian | Level 7

Unfortunately I cant get either of those to do what I need.  the First proc freq killes my pc!!!

I just need a simple count of where the account id appears on the same day more than once so if there was two on the 1st and and two on the second to the same accountid the  count would be 2.

Any other suggestions?

Really do appreciate the help

Stret

TarunKumar
Pyrite | Level 9

data have;

input AuthDate:$9.    AccountID:$6.;

datalines;

13Feb2014  123456     

12Feb2014  321565    

12Feb2014  123456     

12Feb2014  321565 

run;

proc sql;

create table want as

select * from have

group by AuthDate,AccountID

having count(AccountID) >1;

quit;

Stretlow
Obsidian | Level 7

I've never used proc SQL before however this does seem to be giving me the results I need on the count so thank you.

Last question.

Where your code says     having count(AccountID) >1;

If I had another variable in the table can you put multiple criteria in that so maybe if i wanted where count >1 and total amount >= 50, is that also possible in that statement?

Thanks again

TarunKumar
Pyrite | Level 9

yes depend on what you want and how data is.

Stretlow
Obsidian | Level 7

so for exmaple the fields are the same as above except I now have an extra variable named "Amount".

I might want to look where the accountID count is >1 on the same day (as above) and the total amount associated with those account ids >=40?

Stret

TarunKumar
Pyrite | Level 9

data have;

input AuthDate:$9.    AccountID:$6. Amount;

datalines;

13Feb2014  123456 20    

12Feb2014  321565   50 

12Feb2014  123456    10 

12Feb2014  321565   60

run;

proc sql;

create table want as

select distinct AuthDate,AccountID from have

group by AuthDate,AccountID

having count(AccountID) >1 and sum(Amount) > 40;

quit;

JVarghese
Obsidian | Level 7

proc sort data=data1;

     by authdate acctid;

run;

data new (drop=amt);

     retain totalamt ;

     set data1;

     by authdate acctid  ;

     if first.authdate & first.acctid then do;

                                                            count=1;

                                                            totalamt=amt;

                                                      end;

  if (last.authdate & last.acctid)or (first.authdate & last.acctid) then output;

  else do;

       count+1;

      totalamt+amt;

   end;

    

run;

proc print data=new noobs;

     var authdate acctid amt;  

     where count>1 and totalamt>=40;

run;

Try it with your modifications..

JVarghese
Obsidian | Level 7

I think Reeza's answer is  more efficient !

Tom
Super User Tom
Super User

You could also use PROC SUMMARY (which is really just PROC MEANS with NOPRINT option).

proc summary data=have nway ;

  class accountid authdate ;

  var amount ;

  output n=count sum=total out=want (where=(count > 1)) ;

run;

Note that if you have missing values of the AMOUNT variable then the value generated by the N statistic will be different than the value in the automatic variable _FREQ_ because the missing values will be excluded.

Stretlow
Obsidian | Level 7

Hi Chaps.

I've been off work for a week so not had chance to have ago at these (or thank you for your help)

I'll have a pop before the end of the week and report back.

Thanks Again

Stret

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 13 replies
  • 1600 views
  • 3 likes
  • 6 in conversation