Desktop productivity for business analysts and programmers

Ranking top 10 occurrences from a table

Accepted Solution Solved
Reply
Contributor
Posts: 64
Accepted Solution

Ranking top 10 occurrences from a table

Hi all,

 

I have a email campaign activity table and I want to calculate the top ten links people have clicked on grouped by each individual campaign ID. I cannot simply do a order by function because I think it will take the overall order across the whole table. I need the top ten unique to table a, the top 10 unique to table b, etc...

 

 

I would post an example worksheet but I have sensitive data that I am unable to present. I hope I explained my need well enough but feel free to ask for further clarification. 

 

Thanks for any help!


Accepted Solutions
Solution
‎03-05-2018 12:16 PM
Super User
Posts: 23,928

Re: Ranking top 10 occurrences from a table

Use PROC FREQ to calculate the counts. 

 

proc freq data=have noprint;
table campaign*link out=link_freq;
run;

proc sort data=link_freq;
by descending count;
run;

data want;
set link_freq;
by campaign;
if first.campaign then count=0;
else count+1;
if count <= 10;
run;

View solution in original post


All Replies
Respected Advisor
Posts: 3,249

Re: Ranking top 10 occurrences from a table

Without being able to see your data, let me suggest you use PROC RANK, and use a BY statement to separate the RANKS by table.

--
Paige Miller
Solution
‎03-05-2018 12:16 PM
Super User
Posts: 23,928

Re: Ranking top 10 occurrences from a table

Use PROC FREQ to calculate the counts. 

 

proc freq data=have noprint;
table campaign*link out=link_freq;
run;

proc sort data=link_freq;
by descending count;
run;

data want;
set link_freq;
by campaign;
if first.campaign then count=0;
else count+1;
if count <= 10;
run;
Super User
Posts: 6,899

Re: Ranking top 10 occurrences from a table

@Reeza has the right idea.  There are just a few spots where you need to clean up the code:

 

proc freq data=have noprint;
table campaign*link / out=link_freq;
run;

proc sort data=link_freq;
by campaign descending count;
run;

data want;
set link_freq;
by campaign;
if first.campaign then cnt=1;
else cnt+1;
if cnt <= 10;
run;
Super User
Posts: 23,928

Re: Ranking top 10 occurrences from a table

Posted in reply to Astounding

Thanks @Astounding! Without data it's easy to skip steps sometimes Smiley Happy

Contributor
Posts: 64

Re: Ranking top 10 occurrences from a table

[ Edited ]
Posted in reply to Astounding

Hi @Astounding,

 

This seems to work. However, the last part seems to only bring back the top 10 rows from the link_freq table. I am looking to get the top 10 per campaign_id.

 

Regards

Super User
Posts: 23,928

Re: Ranking top 10 occurrences from a table

Can you post the exact code you used?

By Campaign should return results for every campaign.

 


Dogo23 wrote:

Hi @Astounding,

 

This seems to work. However, the last part seems to only bring back the top 10 rows from the link_freq table. I am looking to get the top 10 per campaign_id.

 

Regards


 

Contributor
Posts: 64

Re: Ranking top 10 occurrences from a table

[ Edited ]
proc freq
data=tableau.activity_match noprint;
table marketing_history_id*link_name / out=link_freq;
run;

proc sort data=link_freq;
by marketing_history_id descending count;
run;

data want;
set link_freq;
by marketing_history_id;
if marketing_history_id then cnt=1;
else cnt+1;
if cnt <= 10;
run;

Capture.PNG

Super User
Posts: 23,928

Re: Ranking top 10 occurrences from a table

It looks correct to me, and it seems like the answer/output is correct if that's the screenshot. The first entry has only 8, so maybe that campaign only had 8 entries for some reason?

Contributor
Posts: 64

Re: Ranking top 10 occurrences from a table

You're correct that the first entry only had 8 records. However, there are are 500+ of those campaign_ids (marketing_history_id) that are not coming through.

 

I was thinking this should have the top ten for each one. If I change cnt from <=10 to say something like 20, it just brings in the next 20 rows from that table instead. It's almost like saying limit to first 10 rows from the table when I really need the top 10 by ID. Hope that makes sense. 

Contributor
Posts: 64

Re: Ranking top 10 occurrences from a table

[ Edited ]

@ReezaHere's the data I pulling from in that second step. As you can see, there are more ID's that should be pulled into that final output. 

 

Capture.PNG

Super User
Posts: 23,928

Re: Ranking top 10 occurrences from a table

Include the log from the code please, the code appears correct...

Contributor
Posts: 64

Re: Ranking top 10 occurrences from a table

@Reeza

 

1 ;*';*";*/;quit;run;
2 OPTIONS PAGENO=MIN;
3 %LET _CLIENTTASKLABEL='Program';
4 %LET _CLIENTPROJECTPATH='C:\Users\1364503\Desktop\Dataset For Tableau (002).egp';
5 %LET _CLIENTPROJECTNAME='Dataset For Tableau (002).egp';
6 %LET _SASPROGRAMFILE=;
7
8 ODS _ALL_ CLOSE;
9 OPTIONS DEV=ACTIVEX;
10 GOPTIONS XPIXELS=0 YPIXELS=0;
11 FILENAME EGSR TEMP;
12 ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
13 STYLE=HtmlBlue
14 STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HtmlBlue.css")
15 NOGTITLE
16 NOGFOOTNOTE
17 GPATH=&sasworklocation
18 ENCODING=UTF8
19 options(rolap="on")
20 ;
NOTE: Writing TAGSETS.SASREPORT13(EGSR) Body file: EGSR
21
22 GOPTIONS ACCESSIBLE;
23 proc freq
24 data=tableau.activity_match noprint;
25 table marketing_history_id*link_name / out=link_freq;
26 run;

NOTE: There were 25567356 observations read from the data set TABLEAU.ACTIVITY_MATCH.
NOTE: The data set WORK.LINK_FREQ has 154424 observations and 4 variables.
NOTE: PROCEDURE FREQ used (Total process time):
real time 20.51 seconds
cpu time 20.52 seconds

27
28 proc sort data=link_freq;
29 by marketing_history_id descending count;
30 run;

NOTE: There were 154424 observations read from the data set WORK.LINK_FREQ.
NOTE: The data set WORK.LINK_FREQ has 154424 observations and 4 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.20 seconds
cpu time 0.20 seconds

31
32 data want;
33 set link_freq;
34 by marketing_history_id;
35 if marketing_history_id then cnt=1;
36 else cnt+1;
37 if cnt <= 10;
38 run;

NOTE: Character values have been converted to numeric values at the places given by: (Line)Smiley SadColumn).
35:4
2 The SAS System 10:02 Monday, March 5, 2018

NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146825EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146825EN LINK_NAME=MAINCTA COUNT=16 PERCENT=0.0000625798 FIRST.MARKETING_HISTORY_ID=1
LAST.MARKETING_HISTORY_ID=0 cnt=1 _ERROR_=1 _N_=1
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146825EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146825EN LINK_NAME=TRTMYACCCT COUNT=16 PERCENT=0.0000625798 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=0 cnt=2 _ERROR_=1 _N_=2
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146825EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146825EN LINK_NAME=TRTRDMPTS COUNT=16 PERCENT=0.0000625798 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=0 cnt=3 _ERROR_=1 _N_=3
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146825EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146825EN LINK_NAME=TRTRATES COUNT=12 PERCENT=0.0000469348 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=0 cnt=4 _ERROR_=1 _N_=4
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146825EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146825EN LINK_NAME=TRTSPECIAL COUNT=12 PERCENT=0.0000469348 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=0 cnt=5 _ERROR_=1 _N_=5
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146825EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146825EN LINK_NAME=TRTS0105AB COUNT=4 PERCENT=0.0000156449 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=0 cnt=6 _ERROR_=1 _N_=6
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146825EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146825EN LINK_NAME=TRTS0105AM COUNT=4 PERCENT=0.0000156449 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=0 cnt=7 _ERROR_=1 _N_=7
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146825EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146825EN LINK_NAME=TRTS0401AM COUNT=4 PERCENT=0.0000156449 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=1 cnt=8 _ERROR_=1 _N_=8
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146826EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146826EN LINK_NAME=TRTS0106AM COUNT=4 PERCENT=0.0000156449 FIRST.MARKETING_HISTORY_ID=1
LAST.MARKETING_HISTORY_ID=0 cnt=9 _ERROR_=1 _N_=9
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146826EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146826EN
LINK_NAME=LINK 01 - /CONTENT/GP/EN/HOME.HTML?src={(META:EMAIL_TYPE)}_EMAIL_{(META:CAMPAIGNNAME)}_{(TERRITORY_EMAIL_VERSION)}_{(TRANS
LATION_LANG)}_{(META:MAIL_DATE)}_SLOT1PR3_{(MARKETING_HISTORY_ID)}_{(LP_MBR_ACCT)} COUNT=2 PERCENT=7.8224749E-6
FIRST.MARKETING_HISTORY_ID=0 LAST.MARKETING_HISTORY_ID=0 cnt=10 _ERROR_=1 _N_=10
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146826EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146826EN LINK_NAME=MAINCTA COUNT=2 PERCENT=7.8224749E-6 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=0 cnt=11 _ERROR_=1 _N_=11
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146826EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146826EN LINK_NAME=TRTMANAGES COUNT=2 PERCENT=7.8224749E-6 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=0 cnt=12 _ERROR_=1 _N_=12
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146826EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146826EN LINK_NAME=TRTMYACCCT COUNT=2 PERCENT=7.8224749E-6 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=0 cnt=13 _ERROR_=1 _N_=13
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146826EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146826EN LINK_NAME=TRTS0106AB COUNT=2 PERCENT=7.8224749E-6 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=0 cnt=14 _ERROR_=1 _N_=14
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146826EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146826EN LINK_NAME=TRTS0401AB COUNT=2 PERCENT=7.8224749E-6 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=0 cnt=15 _ERROR_=1 _N_=15
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592146826EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592146826EN LINK_NAME=TRTSPECIAL COUNT=2 PERCENT=7.8224749E-6 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=1 cnt=16 _ERROR_=1 _N_=16
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592148156EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592148156EN LINK_NAME=TRTMYACCCT COUNT=28 PERCENT=0.0001095146 FIRST.MARKETING_HISTORY_ID=1
LAST.MARKETING_HISTORY_ID=0 cnt=17 _ERROR_=1 _N_=17
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592148156EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592148156EN LINK_NAME=TRTS0105AB COUNT=12 PERCENT=0.0000469348 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=0 cnt=18 _ERROR_=1 _N_=18
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592148156EN' , at line 35 column 4.
MARKETING_HISTORY_ID=A00487002592148156EN LINK_NAME=TRTS0502AM COUNT=8 PERCENT=0.0000312899 FIRST.MARKETING_HISTORY_ID=0
3 The SAS System 10:02 Monday, March 5, 2018

LAST.MARKETING_HISTORY_ID=0 cnt=19 _ERROR_=1 _N_=19
NOTE: Invalid numeric data, MARKETING_HISTORY_ID='A00487002592148156EN' , at line 35 column 4.
WARNING: Limit set by ERRORS= option reached. Further errors of this type will not be printed.
MARKETING_HISTORY_ID=A00487002592148156EN LINK_NAME=TRTSPECIAL COUNT=8 PERCENT=0.0000312899 FIRST.MARKETING_HISTORY_ID=0
LAST.MARKETING_HISTORY_ID=0 cnt=20 _ERROR_=1 _N_=20
NOTE: There were 154424 observations read from the data set WORK.LINK_FREQ.
NOTE: The data set WORK.WANT has 10 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.06 seconds
cpu time 0.07 seconds

39
40 GOPTIONS NOACCESSIBLE;
41 %LET _CLIENTTASKLABEL=;
42 %LET _CLIENTPROJECTPATH=;
43 %LET _CLIENTPROJECTNAME=;
44 %LET _SASPROGRAMFILE=;
45
46 ;*';*";*/;quit;run;
47 ODS _ALL_ CLOSE;
48
49
50 QUIT; RUN;
51

Super User
Posts: 23,928

Re: Ranking top 10 occurrences from a table

huh...anything in there look like it's flagging something for you?

 

NOTE: Character values have been converted to numeric values at the places given by: (Line)Smiley SadColumn).
35:4

Finding line 35/4

35 if marketing_history_id then cnt=1;

 

You forgot the first. portion of that code. Fixing that will fix the issue. Check the original code provided for reference.

Contributor
Posts: 64

Re: Ranking top 10 occurrences from a table

Hi @Reeza

 

I didn't have the fist (bold in code below) in the step. I didn't realize that had to be in the code, but that brought in all the other records.

 

Thanks for your help!

 

data want;
set link_freq;
by marketing_history_id;
if first.marketing_history_id then cnt=1;
else cnt+1;
if cnt <= 10;
run;

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 250 views
  • 2 likes
  • 4 in conversation