BookmarkSubscribeRSS Feed
Soha
Calcite | Level 5
Hi,

I need to get the count of distinct values of a variable in my proc report. How can we do it? Using the statistic 'n' is giving me the count of the number of rows but I want the count of the distinct values as some values in my data would be repeating. For ex: if there are 2 variables Region and AgentID and the data contains multiple records for each agentID in each region ( say, there are totally 5 regions) and I want the report to show the number of agents in each region then how can i show it using proc report? Also, I want to show the percent of agents in each region ( which I think I can show using pctsum statistic) but I want to know how to find out the number of agents in each region using proc report.

Thanks
Sohail Mohammad
12 REPLIES 12
Cynthia_sas
SAS Super FREQ
Hi:
I'm not exactly sure what you mean. Consider the data in SASHELP.CLASS. If I look at the number of students between the ages of 11 and 13, There are a total of 10 students -- 2 are age 11; 5 are age 12 and 3 are age 13. So that is 3 distinct values for AGE, but different distinct counts within each age group.

So which numbers do you want to see? Do you want to see the number 3 -- which represents the number of distinct AGE values or do you want to see the numbers within each age group????

cynthia
Ksharp
Super User
Hi.
proc report 's computed statement can do .
Cynthia@sas has ability to do it.
You need to post some data sample and the result which you wanted.

Ksharp
Soha
Calcite | Level 5
Hi,

I got what I was looking for but I had to do some preprocessing before using proc report.

What I was looking for is this:

data test;
input Region $ Sales AgentID;
cards;
E 12 1
E 14 1
E 17 2
E 12 1
E 14 3
E 12 2
E 18 3
N 18 4
N 16 4
N 12 5
N 17 4
N 25 4
S 12 7
S 12 8
S 13 7
S 12 8
W 27 9
;
run;

proc report data = test nowd;
column Region Sales AgentID;
define region / group;
define sales / analysis sum;
define AgentID / analysis n "Number of Agents";
rbreak after / summarize Ol UL;
run;

Now, when you see the report generated for the above data it doesn't give me the number of agents in each region, it gives me the count of the number of records in each region which is not equal to number of agents in each region as each agent has more than one record.

I know we can summarize the data first using data step or proc sql ( and that is what I did for the time being) and then use proc report. But I wanted to know if there is a way directly in proc report to get this done. May be using the computed block can help but I dont know how.
Cynthia_sas
SAS Super FREQ
Hi:
If you want a report with only 4 report rows, plus a summary row, then you are correct that pre-processing with PROC REPORT was the correct way to go. If you were going to count "unique" agents inside PROC REPORT with a COMPUTE block, then AgentID would have to be a GROUP variable on the report -- which would mean that you would no longer have just 4 report rows (one row for each region.)

Remember that PROC REPORT does not have a PDV (Program Data Vector) like a DATA step program. PROC REPORT goes through a "summarizing" phase before it starts and so when you have just REGION as a GROUP item on the report, before PROC REPORT even starts, the input data are summarized accordingly and PROC REPORT -only- deals with the summarized rows.

Consider the program below that uses REGION and AGENTID as group items. The resulting report does show the number of unique agents, but because AGENTID has to be on the REPORT for counting/summarizing, the final report has more than 4 report rows (which is not what you showed as the desired output).

BTW, you do not have any AGENTID=6 in your posted INPUT data.

cynthia
[pre]
ods listing close;
ods html file='c:\temp\uniqagent.html' style=sasweb;
proc report data = test nowd;
column Region AgentID Sales sales=slcnt;
define region / group;
define agentID / group;
define sales / analysis sum;
define slcnt / analysis n "Number of Sales";

compute before;
** initialize counter to 0 at "top" of report;
uniqagent = 0;
endcomp;
compute agentID;
** increment counter for every AGENTID report row;
if _break_ ne '_RBREAK_' then uniqagent + 1;
endcomp;

rbreak after / summarize Ol UL;
compute after / style={just=r font_weight=bold};
line 'Number of agents = ' uniqagent 2.0;
endcomp;
run;
ods html close;
[/pre]

Compared to "pre-processing" approach:
[pre]
proc sort data=test out=test;
by region agentid;
run;

data test2;
set test; by agentid;
if first.agentid then agent_cnt = 1;
else agent_cnt = 0;
run;

ods listing;
proc print data=test2;
title 'what does agent_cnt look like';
run;

ods listing close;
ods html file='c:\temp\orig.html' style=sasweb;
proc report data = test2 nowd;
column Region Sales agent_cnt;
define region / group;
define sales / analysis sum;
define agent_cnt / analysis sum "Number of Agents";

rbreak after / summarize Ol UL;
run;
ods html close;
[/pre]
Ksharp
Super User
There is only an example of details report. I am not sure it is what you need.
Cynthia's second code is great. You can use it.


[pre]
data test;
input Region $ Sales AgentID;
cards;
E 12 1
E 14 1
E 17 2
E 12 1
E 14 3
E 12 2
E 18 3
N 18 4
N 16 4
N 12 5
N 17 4
N 25 4
S 12 7
S 12 8
S 13 7
S 12 8
W 27 9
;
run;

proc report data = test nowd out=temp;
column Region AgentID sales agent;
define region / group;
define AgentID / group;
define sales / analysis sum;
define agent /computed "Number of Agents";

compute before region;
count=-1;
endcomp;

compute agent;
if agentid ne lag(agentid) then count+1;
agent=count;
endcomp;



break after region/ summarize skip ol ;
run;

[/pre]


Ksharp Message was edited by: Ksharp
tomtang
Calcite | Level 5

Hi Ksharp,

how can i prevent producing the first row of each Regoin using Proc Report.

    The SAS System
RegionAgentIDSalesNumber_BREAK_
of Agents
E.990Region
E1380
E2291
E3322
E.993Region
N.883Region
N4760
N5121
N.882Region
S.492Region
S7250
S8241
S.492Region
W.272Region
W9270
W.271Region
Ksharp
Super User

I don't understand what you mean exactly --- 'the first row of each Regoin ' ?

You want delete obs which are the first obs in each region group in TEMP dataset ? That is easy.

data x;

set temp;

if region ne lag(region) then delete;

run;

Ksharp

tomtang
Calcite | Level 5

Hi Ksharp,

Thank you for your code.

Yes. we can use one more data step to get rid of the first row.

My interested is if there is any statement or option in REPORT procendure can do that?

to me, i only need the last row of each region which show me the total sales and unique agent.

so my question could be if there is any way to create the temp data show the last row of each region only?

Thanks

E.990Region
E.993Region
Ksharp
Super User

That is easy. Just add  a filter condition on the TEMP dataset.

proc report data = test nowd out=temp(where=(agentid is missing));

But I prefer to use SQL.

proc sql;

create table want as

select region,sum(sales) as sum_sales,count(distinct agentid) as NumofAgentId

  from test

   group by region;

quit;

Ksharp

tomtang
Calcite | Level 5

Hi Ksharp,

As we known, we can summarizing data using group by wtih cube / rollup in SQL server.

i dont think SAS proc sql has this option.

i am alway struggling with decision of using proc mean/report/tabulate or proc sql for summarizing data.

with your great SAS programming experince, what's your thoughts and recommendation.

Thanks,

Ksharp
Super User

Well, I would firstly choose SQL way, which is more fast and more flexible and more succinct .Sure of course, If you want some special statistical estimator ,I will use proc means, SQL has some limitation about yielding these special statistical estimator . and I will use proc report / tabulate if you want some beautiful report or PDF/HTML .

Ksharp

tomtang
Calcite | Level 5

Thanks Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 16482 views
  • 0 likes
  • 4 in conversation