BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Sth19
Calcite | Level 5

I am trying to count the amount of times an email occurs on different accounts.  Note that my data set contains transnational info, so an account can appear multiple times.  I am not concerned with counting the occurrence of an email on the same account, however counting the same email on multiple accounts is disirable.    

 

 

I have the following code: 

PROC SORT DATA=WORK.TARG;
	BY ACCESS_NUMBER;
RUN;

DATA LOOK_AHEAD;
	SET WORK.TARG;
	BY ACCESS_NUMBER;

	SET TARG  ( firstobs = 2 keep = ACCESS_NUMBER rename = (ACCESS_NUMBER = Next_ACC_NUM) )
        TARG	  (      obs = 1 drop = _all_                                     );

RUN;

PROC SORT DATA=WORK.LOOK_AHEAD;
	BY ACCESS_NUMBER OR_email_parsed;
run;

DATA COUNT_EMAIL;
	SET WORK.LOOK_AHEAD;
	by access_number OR_email_parsed;
	COUNT=0;
	IF (INPUT(ACCESS_NUMBER,18.)^=INPUT(NEXT_ACC_NUM,18.)) THEN DO;
		IF first.or_email_parsed then count+0;
		count+1;
		if last.or_email_parsed then output;
	end;
run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

This might not be fastest, but it should get the job done easily enough.

 

proc freq data=have;

tables email * access_number / noprint out=combinations;

run;

proc freq data=combinations;

tables email / noprint out=final_counts;

run;

 

data want;

merge combinations (keep=email access_number)

final_counts (keep=email count rename=(count=counter));

by email;

run;

 

Then optionally:

 

proc sort data=want;

by access_number email;

run;

View solution in original post

8 REPLIES 8
Reeza
Super User

On the data step count_email,  the error occurs on the first if statement.

 

What error? You haven't stated any errors or provided the log. 

You could probably simplify this using the LAG() function as well.

 

 


@Sth19 wrote:

I am trying to count the amount of times an email occurs on different accounts.  Note that my data set contains transnational info, so an account can appear multiple times.  I am not concerned with counting the occurrence of an email on the same account, however counting the same email on multiple accounts is disirable.    

 

 

I have the following code: 

PROC SORT DATA=WORK.TARG;
	BY ACCESS_NUMBER;
RUN;

DATA LOOK_AHEAD;
	SET WORK.TARG;
	BY ACCESS_NUMBER;

	SET TARG  ( firstobs = 2 keep = ACCESS_NUMBER rename = (ACCESS_NUMBER = Next_ACC_NUM) )
        TARG	  (      obs = 1 drop = _all_                                     );

RUN;

PROC SORT DATA=WORK.LOOK;
	BY ACCESS_NUMBER OR_email_parsed;
run;

DATA COUNT_EMAIL;
	SET WORK.LOOK_AHEAD;
	by access_number OR_email_parsed;
	COUNT=0;
	IF (INPUT(ACCESS_NUMBER,18.)^=INPUT(NEXT_ACC_NUM,18.) THEN DO;
		IF first.or_email_parsed then count+0;
		count+1;
		if last.or_email_parsed then output;
	end;
run;

On the data step count_email,  the error occurs on the first if statement.

 

 




Astounding
PROC Star

This isn't a comment about your plan, or about your syntax, other than the source of the error.

 

Your IF statement contains unbalanced parentheses.  It's easy to fix, you just have to notice it.

novinosrin
Tourmaline | Level 20

Hi @Sth19, If you could post a sample data of HAVE and WANT, even lazy people like me would be interested to attempt various solutions from which you would have the choice of picking the best you think that suits your need. Thank you!

Sth19
Calcite | Level 5

Have:

 

Access_Number  Email

1                          abc@123.com

1                          abc@123.com

1                          def@123.com

2                          abc@123.com

3                          efg@123.com

4                          abc@123.com

4                          efg@123.com

 

Want 

 

Access_Number Email                   Counter

1                          abc@123.com    3

1                          def@123.com     1

2                          abc@123.com    3

3                          efg@123.com     2

4                          abc@123.com    3 

4                          efg@123.com     2

             

Reeza
Super User
  1. Count each email separately using PROC FREQ/SQL only on the email address. 
  2. Then get distinct email ID table, again, either using PROC FREQ or SQL. 
  3. Merge the results from Step 1 with Step 2, by email address.

@Sth19 wrote:

Have:

 

Access_Number  Email

1                          abc@123.com

1                          abc@123.com

1                          def@123.com

2                          abc@123.com

3                          efg@123.com

4                          abc@123.com

4                          efg@123.com

 

Want 

 

Access_Number Email                   Counter

1                          abc@123.com    3

1                          def@123.com     1

2                          abc@123.com    3

3                          efg@123.com     2

4                          abc@123.com    3 

4                          efg@123.com     2

             


 

Astounding
PROC Star

This might not be fastest, but it should get the job done easily enough.

 

proc freq data=have;

tables email * access_number / noprint out=combinations;

run;

proc freq data=combinations;

tables email / noprint out=final_counts;

run;

 

data want;

merge combinations (keep=email access_number)

final_counts (keep=email count rename=(count=counter));

by email;

run;

 

Then optionally:

 

proc sort data=want;

by access_number email;

run;

novinosrin
Tourmaline | Level 20
data have;
input Access_Number Email : $30.;
datalines;
1                          abc@123.com
1                          abc@123.com
1                          def@123.com
2                          abc@123.com
3                          efg@123.com
4                          abc@123.com
4                          efg@123.com
;

proc sort data=have out=_have nodupkey;
by Access_Number Email;
run;

proc sql;
create table want as
select *, count(email) as counter
from _have
group by email
order by Access_Number;
quit;
novinosrin
Tourmaline | Level 20
/*and if you wanna avoid the sort and make sql manage with distinct, have that as inline view as demonstrated below*/


proc sql;
create table want as
select *, count(email) as counter
from (select distinct Access_Number,Email from have)
group by email
order by Access_Number;
quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 988 views
  • 0 likes
  • 4 in conversation