Hello
I want to calculate counts(count members) and percentage from total for each group.
The calculation is made from raw data table.
I know how to do it with proc sql but as you can see it is very long code.
I wonder if anyone suggest a better way(less code) to get same result.
The result is in a table called Final
/******Calculate counts and percentage from total*******/
/******Calculate counts and percentage from total*******/
/******Calculate counts and percentage from total*******/
/******Calculate counts and percentage from total*******/
/******Calculate counts and percentage from total*******/
data have;
input ID score ;
cards;
1 0
2 0
3 2
4 3
5 0
6 2
7 11
8 11
9 2
10 0
;
run;
PROC SQL;
create table t1 as
select score,1 as help,
count(*) as No_customers
from have
group by score
;
QUIT;
PROC SQL;
create table t2 as
select 1 as help,
sum( No_customers) as Grand_Total_customers
from t1
;
QUIT;
PROC SQL;
create table t3 as
select a.score,a.No_customers,
a.No_customers/b.Grand_Total_customers as PCT_From_Total format=percent7.1
from t1 as a
left join t2 as b
on a.help=b.help
;
QUIT;
PROC SQL;
create table t4 as
select . as score ,
sum(No_customers) as No_customers,
sum(PCT_From_Total) as PCT_From_Total format=percent7.1
from t3
;
QUIT;
Data Final;
Set t3 t4;
Run;
title;
proc print data=Final noobs;run;
@Ronein: It's easier when you use an appropriate toolset. For example, try this:
proc freq noprint data=have ;
tables score / out=freq outcum ;
run ;
data want (keep=score count percent rename=(count=No_customers percent=PCT_from_Total)) ;
set freq end = z ;
output ;
if z ;
count = cum_freq ;
percent = cum_pct ;
score = . ;
output ;
format percent 7.1 ;
run ;
HTH
Paul D.
Or proc tabulate:
proc format;
picture pctfmt (round) other='009.9%';
run;
proc tabulate data=work.have;
class score;
table score='' all,n='No_customers' pctn='PCT_From_Total'*f=pctfmt. / box='score';
run;
On second thought, if your score values are limited-range integers (say, from 0 to 100, as assumed below) and you would like to do everything in a single step, you can use a key-indexed table:
data want (keep = score n p rename=(n=No_customers p=PCT_From_Total)) ;
array f [0:100] _temporary_ ;
do until (end) ;
set have end = end ;
f [score] + 1 ;
end ;
t = sum (of f [*]) ;
do score = lbound (f) to hbound (f) ;
n = f [score] ;
if nmiss (n) then continue ;
p = divide (n, t) ;
tp + p ;
output ;
end ;
score = . ;
n = t ;
p = tp ;
output ;
format p percent7.1 ;
run ;
However, if your score values are arbitrary-type, arbitrary-range, you still can do it in one step, but then the key-indexed table needs to be replaced with a hash table. It makes code somewhat lengthier but also makes no assumptions about data:
data want (keep = score n p rename=(n=No_customers p=PCT_From_Total)) ;
dcl hash h () ;
h.definekey ("score") ;
h.definedata ("score", "n") ;
h.definedone () ;
do until (end) ;
set have end = end ;
if h.find() ne 0 then n = 1 ;
else n + 1 ;
h.replace() ;
end ;
dcl hiter i ("h") ;
do while (i.next() = 0) ;
t + n ;
end ;
do while (i.next() = 0) ;
p = divide (n, t) ;
output ;
tp + p ;
end ;
score = . ;
n = t ;
p = tp ;
output ;
format p percent7.1 ;
run ;
By the way, I do like the TABULATE solution by @andreas_lds (and like this proc in general) but think that you need an output data set.
Paul D.
@hashman wrote:
By the way, I do like the TABULATE solution by @andreas_lds (and like this proc in general) but think that you need an output data set.
Paul D.
Proc Tabulate will create output data sets, just add Out=datasetname to the Proc statement. However the structure of the output set doesn't look like what you might expect (at least not until you have used it a bit) with table indicators, All summary indicators, variables having statistics appended to the names like the Autoname option in proc means/summary, a different format of the _type_ variable and such. I do have some use for the tabulate output sets so get to dig into these. Often a data step is needed for some post processing though.
But of course you're right. What I meant is that @andreas_lds didn't offer that provision. And your points about the TABULATE output data set format ought to be well taken. But even if the proc spitted it out cleanly, some post-processing would be needed, anyway (e.g., to set the score to null for the totals). The TABULATE output data set can still be quite useful under some scenarios, post-processing notwithstanding, since the proc can do quite a bit of rather convoluted aggregation in a few lines of code (and do it right, too).
Best
Paul D.
Use the right tool! PROC FREQ is the simplest way to get counts and percentages.
Please explain in detail.
Show us (a portion of) your input data and the desired output.
You haven't shown us the input data. Please provide a portion of (not necessarily all of) your data, via these instructions: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...
Data should be in the form above, not any other form.
We can't really provide code until we know what the input data looks like.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.