Help using Base SAS procedures

Getting count of distinct values using Proc Report

Reply
Occasional Contributor
Posts: 17

Getting count of distinct values using Proc Report

Hi,

I need to get the count of distinct values of a variable in my proc report. How can we do it? Using the statistic 'n' is giving me the count of the number of rows but I want the count of the distinct values as some values in my data would be repeating. For ex: if there are 2 variables Region and AgentID and the data contains multiple records for each agentID in each region ( say, there are totally 5 regions) and I want the report to show the number of agents in each region then how can i show it using proc report? Also, I want to show the percent of agents in each region ( which I think I can show using pctsum statistic) but I want to know how to find out the number of agents in each region using proc report.

Thanks
Sohail Mohammad
SAS Super FREQ
Posts: 8,868

Re: Getting count of distinct values using Proc Report

Hi:
I'm not exactly sure what you mean. Consider the data in SASHELP.CLASS. If I look at the number of students between the ages of 11 and 13, There are a total of 10 students -- 2 are age 11; 5 are age 12 and 3 are age 13. So that is 3 distinct values for AGE, but different distinct counts within each age group.

So which numbers do you want to see? Do you want to see the number 3 -- which represents the number of distinct AGE values or do you want to see the numbers within each age group????

cynthia
Super User
Posts: 10,044

Re: Getting count of distinct values using Proc Report

Hi.
proc report 's computed statement can do .
Cynthia@sas has ability to do it.
You need to post some data sample and the result which you wanted.

Ksharp
Occasional Contributor
Posts: 17

Re: Getting count of distinct values using Proc Report

Hi,

I got what I was looking for but I had to do some preprocessing before using proc report.

What I was looking for is this:

data test;
input Region $ Sales AgentID;
cards;
E 12 1
E 14 1
E 17 2
E 12 1
E 14 3
E 12 2
E 18 3
N 18 4
N 16 4
N 12 5
N 17 4
N 25 4
S 12 7
S 12 8
S 13 7
S 12 8
W 27 9
;
run;

proc report data = test nowd;
column Region Sales AgentID;
define region / group;
define sales / analysis sum;
define AgentID / analysis n "Number of Agents";
rbreak after / summarize Ol UL;
run;

Now, when you see the report generated for the above data it doesn't give me the number of agents in each region, it gives me the count of the number of records in each region which is not equal to number of agents in each region as each agent has more than one record.

I know we can summarize the data first using data step or proc sql ( and that is what I did for the time being) and then use proc report. But I wanted to know if there is a way directly in proc report to get this done. May be using the computed block can help but I dont know how.
SAS Super FREQ
Posts: 8,868

Re: Getting count of distinct values using Proc Report

Hi:
If you want a report with only 4 report rows, plus a summary row, then you are correct that pre-processing with PROC REPORT was the correct way to go. If you were going to count "unique" agents inside PROC REPORT with a COMPUTE block, then AgentID would have to be a GROUP variable on the report -- which would mean that you would no longer have just 4 report rows (one row for each region.)

Remember that PROC REPORT does not have a PDV (Program Data Vector) like a DATA step program. PROC REPORT goes through a "summarizing" phase before it starts and so when you have just REGION as a GROUP item on the report, before PROC REPORT even starts, the input data are summarized accordingly and PROC REPORT -only- deals with the summarized rows.

Consider the program below that uses REGION and AGENTID as group items. The resulting report does show the number of unique agents, but because AGENTID has to be on the REPORT for counting/summarizing, the final report has more than 4 report rows (which is not what you showed as the desired output).

BTW, you do not have any AGENTID=6 in your posted INPUT data.

cynthia
[pre]
ods listing close;
ods html file='c:\temp\uniqagent.html' style=sasweb;
proc report data = test nowd;
column Region AgentID Sales sales=slcnt;
define region / group;
define agentID / group;
define sales / analysis sum;
define slcnt / analysis n "Number of Sales";

compute before;
** initialize counter to 0 at "top" of report;
uniqagent = 0;
endcomp;
compute agentID;
** increment counter for every AGENTID report row;
if _break_ ne '_RBREAK_' then uniqagent + 1;
endcomp;

rbreak after / summarize Ol UL;
compute after / style={just=r font_weight=bold};
line 'Number of agents = ' uniqagent 2.0;
endcomp;
run;
ods html close;
[/pre]

Compared to "pre-processing" approach:
[pre]
proc sort data=test out=test;
by region agentid;
run;

data test2;
set test; by agentid;
if first.agentid then agent_cnt = 1;
else agent_cnt = 0;
run;

ods listing;
proc print data=test2;
title 'what does agent_cnt look like';
run;

ods listing close;
ods html file='c:\temp\orig.html' style=sasweb;
proc report data = test2 nowd;
column Region Sales agent_cnt;
define region / group;
define sales / analysis sum;
define agent_cnt / analysis sum "Number of Agents";

rbreak after / summarize Ol UL;
run;
ods html close;
[/pre]
Super User
Posts: 10,044

Re: Getting count of distinct values using Proc Report

There is only an example of details report. I am not sure it is what you need.
Cynthia's second code is great. You can use it.


[pre]
data test;
input Region $ Sales AgentID;
cards;
E 12 1
E 14 1
E 17 2
E 12 1
E 14 3
E 12 2
E 18 3
N 18 4
N 16 4
N 12 5
N 17 4
N 25 4
S 12 7
S 12 8
S 13 7
S 12 8
W 27 9
;
run;

proc report data = test nowd out=temp;
column Region AgentID sales agent;
define region / group;
define AgentID / group;
define sales / analysis sum;
define agent /computed "Number of Agents";

compute before region;
count=-1;
endcomp;

compute agent;
if agentid ne lag(agentid) then count+1;
agent=count;
endcomp;



break after region/ summarize skip ol ;
run;

[/pre]


Ksharp Message was edited by: Ksharp
Occasional Contributor
Posts: 10

Re: Getting count of distinct values using Proc Report

Hi Ksharp,

how can i prevent producing the first row of each Regoin using Proc Report.

    The SAS System
RegionAgentIDSalesNumber_BREAK_
of Agents
E.990Region
E1380
E2291
E3322
E.993Region
N.883Region
N4760
N5121
N.882Region
S.492Region
S7250
S8241
S.492Region
W.272Region
W9270
W.271Region
Super User
Posts: 10,044

Re: Getting count of distinct values using Proc Report

I don't understand what you mean exactly --- 'the first row of each Regoin ' ?

You want delete obs which are the first obs in each region group in TEMP dataset ? That is easy.

data x;

set temp;

if region ne lag(region) then delete;

run;

Ksharp

Occasional Contributor
Posts: 10

Re: Getting count of distinct values using Proc Report

Hi Ksharp,

Thank you for your code.

Yes. we can use one more data step to get rid of the first row.

My interested is if there is any statement or option in REPORT procendure can do that?

to me, i only need the last row of each region which show me the total sales and unique agent.

so my question could be if there is any way to create the temp data show the last row of each region only?

Thanks

E.990Region
E.993Region
Super User
Posts: 10,044

Re: Getting count of distinct values using Proc Report

That is easy. Just add  a filter condition on the TEMP dataset.

proc report data = test nowd out=temp(where=(agentid is missing));

But I prefer to use SQL.

proc sql;

create table want as

select region,sum(sales) as sum_sales,count(distinct agentid) as NumofAgentId

  from test

   group by region;

quit;

Ksharp

Occasional Contributor
Posts: 10

Re: Getting count of distinct values using Proc Report

Hi Ksharp,

As we known, we can summarizing data using group by wtih cube / rollup in SQL server.

i dont think SAS proc sql has this option.

i am alway struggling with decision of using proc mean/report/tabulate or proc sql for summarizing data.

with your great SAS programming experince, what's your thoughts and recommendation.

Thanks,

Super User
Posts: 10,044

Re: Getting count of distinct values using Proc Report

Well, I would firstly choose SQL way, which is more fast and more flexible and more succinct .Sure of course, If you want some special statistical estimator ,I will use proc means, SQL has some limitation about yielding these special statistical estimator . and I will use proc report / tabulate if you want some beautiful report or PDF/HTML .

Ksharp

Occasional Contributor
Posts: 10

Re: Getting count of distinct values using Proc Report

Thanks Ksharp

Ask a Question
Discussion stats
  • 12 replies
  • 8207 views
  • 0 likes
  • 4 in conversation