DATA Step, Macro, Functions and more

Count the occurrence of the same value on different accounts

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Count the occurrence of the same value on different accounts

[ Edited ]

I am trying to count the amount of times an email occurs on different accounts.  Note that my data set contains transnational info, so an account can appear multiple times.  I am not concerned with counting the occurrence of an email on the same account, however counting the same email on multiple accounts is disirable.    

 

 

I have the following code: 

PROC SORT DATA=WORK.TARG;
	BY ACCESS_NUMBER;
RUN;

DATA LOOK_AHEAD;
	SET WORK.TARG;
	BY ACCESS_NUMBER;

	SET TARG  ( firstobs = 2 keep = ACCESS_NUMBER rename = (ACCESS_NUMBER = Next_ACC_NUM) )
        TARG	  (      obs = 1 drop = _all_                                     );

RUN;

PROC SORT DATA=WORK.LOOK_AHEAD;
	BY ACCESS_NUMBER OR_email_parsed;
run;

DATA COUNT_EMAIL;
	SET WORK.LOOK_AHEAD;
	by access_number OR_email_parsed;
	COUNT=0;
	IF (INPUT(ACCESS_NUMBER,18.)^=INPUT(NEXT_ACC_NUM,18.)) THEN DO;
		IF first.or_email_parsed then count+0;
		count+1;
		if last.or_email_parsed then output;
	end;
run;

 

 

 


Accepted Solutions
Solution
‎02-22-2018 03:13 PM
Super User
Posts: 6,939

Re: Count the occurrence of the same value on different accounts

This might not be fastest, but it should get the job done easily enough.

 

proc freq data=have;

tables email * access_number / noprint out=combinations;

run;

proc freq data=combinations;

tables email / noprint out=final_counts;

run;

 

data want;

merge combinations (keep=email access_number)

final_counts (keep=email count rename=(count=counter));

by email;

run;

 

Then optionally:

 

proc sort data=want;

by access_number email;

run;

View solution in original post


All Replies
Super User
Posts: 24,027

Re: Count the occurrence of the same value on different accounts

On the data step count_email,  the error occurs on the first if statement.

 

What error? You haven't stated any errors or provided the log. 

You could probably simplify this using the LAG() function as well.

 

 


Sth19 wrote:

I am trying to count the amount of times an email occurs on different accounts.  Note that my data set contains transnational info, so an account can appear multiple times.  I am not concerned with counting the occurrence of an email on the same account, however counting the same email on multiple accounts is disirable.    

 

 

I have the following code: 

PROC SORT DATA=WORK.TARG;
	BY ACCESS_NUMBER;
RUN;

DATA LOOK_AHEAD;
	SET WORK.TARG;
	BY ACCESS_NUMBER;

	SET TARG  ( firstobs = 2 keep = ACCESS_NUMBER rename = (ACCESS_NUMBER = Next_ACC_NUM) )
        TARG	  (      obs = 1 drop = _all_                                     );

RUN;

PROC SORT DATA=WORK.LOOK;
	BY ACCESS_NUMBER OR_email_parsed;
run;

DATA COUNT_EMAIL;
	SET WORK.LOOK_AHEAD;
	by access_number OR_email_parsed;
	COUNT=0;
	IF (INPUT(ACCESS_NUMBER,18.)^=INPUT(NEXT_ACC_NUM,18.) THEN DO;
		IF first.or_email_parsed then count+0;
		count+1;
		if last.or_email_parsed then output;
	end;
run;

On the data step count_email,  the error occurs on the first if statement.

 

 




Super User
Posts: 6,939

Re: Count the occurrence of the same value on different accounts

This isn't a comment about your plan, or about your syntax, other than the source of the error.

 

Your IF statement contains unbalanced parentheses.  It's easy to fix, you just have to notice it.

Super User
Posts: 2,075

Re: Count the occurrence of the same value on different accounts

Hi @Sth19, If you could post a sample data of HAVE and WANT, even lazy people like me would be interested to attempt various solutions from which you would have the choice of picking the best you think that suits your need. Thank you!

Occasional Contributor
Posts: 7

Re: Count the occurrence of the same value on different accounts

Have:

 

Access_Number  Email

1                          abc@123.com

1                          abc@123.com

1                          def@123.com

2                          abc@123.com

3                          efg@123.com

4                          abc@123.com

4                          efg@123.com

 

Want 

 

Access_Number Email                   Counter

1                          abc@123.com    3

1                          def@123.com     1

2                          abc@123.com    3

3                          efg@123.com     2

4                          abc@123.com    3 

4                          efg@123.com     2

             

Super User
Posts: 24,027

Re: Count the occurrence of the same value on different accounts

  1. Count each email separately using PROC FREQ/SQL only on the email address. 
  2. Then get distinct email ID table, again, either using PROC FREQ or SQL. 
  3. Merge the results from Step 1 with Step 2, by email address.

Sth19 wrote:

Have:

 

Access_Number  Email

1                          abc@123.com

1                          abc@123.com

1                          def@123.com

2                          abc@123.com

3                          efg@123.com

4                          abc@123.com

4                          efg@123.com

 

Want 

 

Access_Number Email                   Counter

1                          abc@123.com    3

1                          def@123.com     1

2                          abc@123.com    3

3                          efg@123.com     2

4                          abc@123.com    3 

4                          efg@123.com     2

             


 

Solution
‎02-22-2018 03:13 PM
Super User
Posts: 6,939

Re: Count the occurrence of the same value on different accounts

This might not be fastest, but it should get the job done easily enough.

 

proc freq data=have;

tables email * access_number / noprint out=combinations;

run;

proc freq data=combinations;

tables email / noprint out=final_counts;

run;

 

data want;

merge combinations (keep=email access_number)

final_counts (keep=email count rename=(count=counter));

by email;

run;

 

Then optionally:

 

proc sort data=want;

by access_number email;

run;

Super User
Posts: 2,075

Re: Count the occurrence of the same value on different accounts

data have;
input Access_Number Email : $30.;
datalines;
1                          abc@123.com
1                          abc@123.com
1                          def@123.com
2                          abc@123.com
3                          efg@123.com
4                          abc@123.com
4                          efg@123.com
;

proc sort data=have out=_have nodupkey;
by Access_Number Email;
run;

proc sql;
create table want as
select *, count(email) as counter
from _have
group by email
order by Access_Number;
quit;
Super User
Posts: 2,075

Re: Count the occurrence of the same value on different accounts

Posted in reply to novinosrin
/*and if you wanna avoid the sort and make sql manage with distinct, have that as inline view as demonstrated below*/


proc sql;
create table want as
select *, count(email) as counter
from (select distinct Access_Number,Email from have)
group by email
order by Access_Number;
quit;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 129 views
  • 0 likes
  • 4 in conversation