turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- SAS Procedures
- /
- Getting count of distinct values using Proc Report

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-27-2010 12:10 PM

Hi,

I need to get the count of distinct values of a variable in my proc report. How can we do it? Using the statistic 'n' is giving me the count of the number of rows but I want the count of the distinct values as some values in my data would be repeating. For ex: if there are 2 variables Region and AgentID and the data contains multiple records for each agentID in each region ( say, there are totally 5 regions) and I want the report to show the number of agents in each region then how can i show it using proc report? Also, I want to show the percent of agents in each region ( which I think I can show using pctsum statistic) but I want to know how to find out the number of agents in each region using proc report.

Thanks

Sohail Mohammad

I need to get the count of distinct values of a variable in my proc report. How can we do it? Using the statistic 'n' is giving me the count of the number of rows but I want the count of the distinct values as some values in my data would be repeating. For ex: if there are 2 variables Region and AgentID and the data contains multiple records for each agentID in each region ( say, there are totally 5 regions) and I want the report to show the number of agents in each region then how can i show it using proc report? Also, I want to show the percent of agents in each region ( which I think I can show using pctsum statistic) but I want to know how to find out the number of agents in each region using proc report.

Thanks

Sohail Mohammad

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Soha

10-27-2010 02:58 PM

Hi:

I'm not exactly sure what you mean. Consider the data in SASHELP.CLASS. If I look at the number of students between the ages of 11 and 13, There are a total of 10 students -- 2 are age 11; 5 are age 12 and 3 are age 13. So that is 3 distinct values for AGE, but different distinct counts within each age group.

So which numbers do you want to see? Do you want to see the number 3 -- which represents the number of distinct AGE values or do you want to see the numbers within each age group????

cynthia

I'm not exactly sure what you mean. Consider the data in SASHELP.CLASS. If I look at the number of students between the ages of 11 and 13, There are a total of 10 students -- 2 are age 11; 5 are age 12 and 3 are age 13. So that is 3 distinct values for AGE, but different distinct counts within each age group.

So which numbers do you want to see? Do you want to see the number 3 -- which represents the number of distinct AGE values or do you want to see the numbers within each age group????

cynthia

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Soha

10-29-2010 04:28 AM

Hi.

proc report 's computed statement can do .

Cynthia@sas has ability to do it.

You need to post some data sample and the result which you wanted.

Ksharp

proc report 's computed statement can do .

Cynthia@sas has ability to do it.

You need to post some data sample and the result which you wanted.

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Soha

10-29-2010 06:48 PM

Hi,

I got what I was looking for but I had to do some preprocessing before using proc report.

What I was looking for is this:

data test;

input Region $ Sales AgentID;

cards;

E 12 1

E 14 1

E 17 2

E 12 1

E 14 3

E 12 2

E 18 3

N 18 4

N 16 4

N 12 5

N 17 4

N 25 4

S 12 7

S 12 8

S 13 7

S 12 8

W 27 9

;

run;

proc report data = test nowd;

column Region Sales AgentID;

define region / group;

define sales / analysis sum;

define AgentID / analysis n "Number of Agents";

rbreak after / summarize Ol UL;

run;

Now, when you see the report generated for the above data it doesn't give me the number of agents in each region, it gives me the count of the number of records in each region which is not equal to number of agents in each region as each agent has more than one record.

I know we can summarize the data first using data step or proc sql ( and that is what I did for the time being) and then use proc report. But I wanted to know if there is a way directly in proc report to get this done. May be using the computed block can help but I dont know how.

I got what I was looking for but I had to do some preprocessing before using proc report.

What I was looking for is this:

data test;

input Region $ Sales AgentID;

cards;

E 12 1

E 14 1

E 17 2

E 12 1

E 14 3

E 12 2

E 18 3

N 18 4

N 16 4

N 12 5

N 17 4

N 25 4

S 12 7

S 12 8

S 13 7

S 12 8

W 27 9

;

run;

proc report data = test nowd;

column Region Sales AgentID;

define region / group;

define sales / analysis sum;

define AgentID / analysis n "Number of Agents";

rbreak after / summarize Ol UL;

run;

Now, when you see the report generated for the above data it doesn't give me the number of agents in each region, it gives me the count of the number of records in each region which is not equal to number of agents in each region as each agent has more than one record.

I know we can summarize the data first using data step or proc sql ( and that is what I did for the time being) and then use proc report. But I wanted to know if there is a way directly in proc report to get this done. May be using the computed block can help but I dont know how.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Soha

10-30-2010 01:51 PM

Hi:

If you want a report with only 4 report rows, plus a summary row, then you are correct that pre-processing with PROC REPORT was the correct way to go. If you were going to count "unique" agents inside PROC REPORT with a COMPUTE block, then AgentID would have to be a GROUP variable on the report -- which would mean that you would no longer have just 4 report rows (one row for each region.)

Remember that PROC REPORT does not have a PDV (Program Data Vector) like a DATA step program. PROC REPORT goes through a "summarizing" phase before it starts and so when you have just REGION as a GROUP item on the report, before PROC REPORT even starts, the input data are summarized accordingly and PROC REPORT -only- deals with the summarized rows.

Consider the program below that uses REGION and AGENTID as group items. The resulting report does show the number of unique agents, but because AGENTID has to be on the REPORT for counting/summarizing, the final report has more than 4 report rows (which is not what you showed as the desired output).

BTW, you do not have any AGENTID=6 in your posted INPUT data.

cynthia

[pre]

ods listing close;

ods html file='c:\temp\uniqagent.html' style=sasweb;

proc report data = test nowd;

column Region AgentID Sales sales=slcnt;

define region / group;

define agentID / group;

define sales / analysis sum;

define slcnt / analysis n "Number of Sales";

compute before;

** initialize counter to 0 at "top" of report;

uniqagent = 0;

endcomp;

compute agentID;

** increment counter for every AGENTID report row;

if _break_ ne '_RBREAK_' then uniqagent + 1;

endcomp;

rbreak after / summarize Ol UL;

compute after / style={just=r font_weight=bold};

line 'Number of agents = ' uniqagent 2.0;

endcomp;

run;

ods html close;

[/pre]

Compared to "pre-processing" approach:

[pre]

proc sort data=test out=test;

by region agentid;

run;

data test2;

set test; by agentid;

if first.agentid then agent_cnt = 1;

else agent_cnt = 0;

run;

ods listing;

proc print data=test2;

title 'what does agent_cnt look like';

run;

ods listing close;

ods html file='c:\temp\orig.html' style=sasweb;

proc report data = test2 nowd;

column Region Sales agent_cnt;

define region / group;

define sales / analysis sum;

define agent_cnt / analysis sum "Number of Agents";

rbreak after / summarize Ol UL;

run;

ods html close;

[/pre]

If you want a report with only 4 report rows, plus a summary row, then you are correct that pre-processing with PROC REPORT was the correct way to go. If you were going to count "unique" agents inside PROC REPORT with a COMPUTE block, then AgentID would have to be a GROUP variable on the report -- which would mean that you would no longer have just 4 report rows (one row for each region.)

Remember that PROC REPORT does not have a PDV (Program Data Vector) like a DATA step program. PROC REPORT goes through a "summarizing" phase before it starts and so when you have just REGION as a GROUP item on the report, before PROC REPORT even starts, the input data are summarized accordingly and PROC REPORT -only- deals with the summarized rows.

Consider the program below that uses REGION and AGENTID as group items. The resulting report does show the number of unique agents, but because AGENTID has to be on the REPORT for counting/summarizing, the final report has more than 4 report rows (which is not what you showed as the desired output).

BTW, you do not have any AGENTID=6 in your posted INPUT data.

cynthia

[pre]

ods listing close;

ods html file='c:\temp\uniqagent.html' style=sasweb;

proc report data = test nowd;

column Region AgentID Sales sales=slcnt;

define region / group;

define agentID / group;

define sales / analysis sum;

define slcnt / analysis n "Number of Sales";

compute before;

** initialize counter to 0 at "top" of report;

uniqagent = 0;

endcomp;

compute agentID;

** increment counter for every AGENTID report row;

if _break_ ne '_RBREAK_' then uniqagent + 1;

endcomp;

rbreak after / summarize Ol UL;

compute after / style={just=r font_weight=bold};

line 'Number of agents = ' uniqagent 2.0;

endcomp;

run;

ods html close;

[/pre]

Compared to "pre-processing" approach:

[pre]

proc sort data=test out=test;

by region agentid;

run;

data test2;

set test; by agentid;

if first.agentid then agent_cnt = 1;

else agent_cnt = 0;

run;

ods listing;

proc print data=test2;

title 'what does agent_cnt look like';

run;

ods listing close;

ods html file='c:\temp\orig.html' style=sasweb;

proc report data = test2 nowd;

column Region Sales agent_cnt;

define region / group;

define sales / analysis sum;

define agent_cnt / analysis sum "Number of Agents";

rbreak after / summarize Ol UL;

run;

ods html close;

[/pre]

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Soha

11-03-2010 03:27 AM

There is only an example of details report. I am not sure it is what you need.

Cynthia's second code is great. You can use it.

[pre]

data test;

input Region $ Sales AgentID;

cards;

E 12 1

E 14 1

E 17 2

E 12 1

E 14 3

E 12 2

E 18 3

N 18 4

N 16 4

N 12 5

N 17 4

N 25 4

S 12 7

S 12 8

S 13 7

S 12 8

W 27 9

;

run;

proc report data = test nowd out=temp;

column Region AgentID sales agent;

define region / group;

define AgentID / group;

define sales / analysis sum;

define agent /computed "Number of Agents";

compute before region;

count=-1;

endcomp;

compute agent;

if agentid ne lag(agentid) then count+1;

agent=count;

endcomp;

break after region/ summarize skip ol ;

run;

[/pre]

Ksharp Message was edited by: Ksharp

Cynthia's second code is great. You can use it.

[pre]

data test;

input Region $ Sales AgentID;

cards;

E 12 1

E 14 1

E 17 2

E 12 1

E 14 3

E 12 2

E 18 3

N 18 4

N 16 4

N 12 5

N 17 4

N 25 4

S 12 7

S 12 8

S 13 7

S 12 8

W 27 9

;

run;

proc report data = test nowd out=temp;

column Region AgentID sales agent;

define region / group;

define AgentID / group;

define sales / analysis sum;

define agent /computed "Number of Agents";

compute before region;

count=-1;

endcomp;

compute agent;

if agentid ne lag(agentid) then count+1;

agent=count;

endcomp;

break after region/ summarize skip ol ;

run;

[/pre]

Ksharp Message was edited by: Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

11-30-2012 08:44 PM

Hi Ksharp,

how can i prevent producing the first row of each Regoin using Proc Report.

The SAS System | ||||

Region | AgentID | Sales | Number | _BREAK_ |

of Agents | ||||

E | . | 99 | 0 | Region |

E | 1 | 38 | 0 | |

E | 2 | 29 | 1 | |

E | 3 | 32 | 2 | |

E | . | 99 | 3 | Region |

N | . | 88 | 3 | Region |

N | 4 | 76 | 0 | |

N | 5 | 12 | 1 | |

N | . | 88 | 2 | Region |

S | . | 49 | 2 | Region |

S | 7 | 25 | 0 | |

S | 8 | 24 | 1 | |

S | . | 49 | 2 | Region |

W | . | 27 | 2 | Region |

W | 9 | 27 | 0 | |

W | . | 27 | 1 | Region |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to tomtang

12-02-2012 11:10 PM

I don't understand what you mean exactly --- 'the first row of each Regoin ' ?

You want delete obs which are the first obs in each region group in TEMP dataset ? That is easy.

data x;

set temp;

if region ne lag(region) then delete;

run;

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

12-03-2012 01:56 PM

Hi Ksharp,

Thank you for your code.

Yes. we can use one more data step to get rid of the first row.

My interested is if there is any statement or option in REPORT procendure can do that?

to me, i only need the last row of each region which show me the total sales and unique agent.

so my question could be if there is any way to create the temp data show the last row of each region only?

Thanks

E | . | 99 | 0 | Region |

E | . | 99 | 3 | Region |

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to tomtang

12-03-2012 08:55 PM

That is easy. Just add a filter condition on the TEMP dataset.

proc report data = test nowd out=temp**(where=(agentid is missing))**;

But I prefer to use SQL.

proc sql;

create table want as

select region,sum(sales) as sum_sales,count(distinct agentid) as NumofAgentId

from test

group by region;

quit;

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

12-04-2012 04:11 PM

Hi Ksharp,

As we known, we can summarizing data using group by wtih cube / rollup in SQL server.

i dont think SAS proc sql has this option.

i am alway struggling with decision of using proc mean/report/tabulate or proc sql for summarizing data.

with your great SAS programming experince, what's your thoughts and recommendation.

Thanks,

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to tomtang

12-04-2012 10:12 PM

Well, I would firstly choose SQL way, which is more fast and more flexible and more succinct .Sure of course, If you want some special statistical estimator ,I will use proc means, SQL has some limitation about yielding these special statistical estimator . and I will use proc report / tabulate if you want some beautiful report or PDF/HTML .

Ksharp

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

Posted in reply to Ksharp

12-06-2012 03:20 PM

Thanks Ksharp