BookmarkSubscribeRSS Feed
Ronein
Meteorite | Level 14

Hello

I want to calculate counts(count members) and percentage from total  for each group.

The calculation is made from raw data table.

I know how to do it with proc sql but as you can see it is very long code.

I wonder if anyone suggest a better way(less code) to get same result.

The result is in a table called  Final

/******Calculate counts and percentage from total*******/
/******Calculate counts and percentage from total*******/
/******Calculate counts and percentage from total*******/
/******Calculate counts and percentage from total*******/
/******Calculate counts and percentage from total*******/
data have;
input ID  score  ;
cards;
1  0
2  0
3  2
4  3
5  0
6  2
7  11
8  11
9  2
10 0
;
run;
PROC SQL;
	create table t1 as
	select 	score,1 as help,
                  count(*) as No_customers 
	from have
	group by score
;
QUIT;
PROC SQL;
	create table t2 as
	select 1 as help,
         sum(  No_customers) as Grand_Total_customers 
	from t1  
;
QUIT;
PROC SQL;
	create table t3 as
	select 	a.score,a.No_customers,
             a.No_customers/b.Grand_Total_customers as PCT_From_Total  format=percent7.1
	from t1 as a
	left join t2 as b
	on a.help=b.help
;
QUIT;
PROC SQL;
	create table t4 as
	select 	. as score  ,
	        sum(No_customers) as No_customers,
			sum(PCT_From_Total) as PCT_From_Total format=percent7.1
	from t3
;
QUIT;
Data Final;
Set t3  t4;
Run;
title;
proc print data=Final noobs;run;

 
11 REPLIES 11
hashman
Ammonite | Level 13

@Ronein: It's easier when you use an appropriate toolset. For example, try this:

 

proc freq noprint data=have ;                                                            
  tables score / out=freq outcum ;                                                       
run ;                                                                                    
                                                                                         
data want (keep=score count percent rename=(count=No_customers percent=PCT_from_Total)) ;
  set freq end = z ;                                                                     
  output ;                                                                               
  if z ;                                                                                 
  count = cum_freq ;                                                                     
  percent = cum_pct ;                                                                    
  score = . ;                                                                            
  output ;                                                                               
  format percent 7.1 ;                                                                   
run ;                                                                                    

HTH

Paul D.

andreas_lds
Jade | Level 19

Or proc tabulate:

proc format;
   picture pctfmt (round) other='009.9%';
run;

proc tabulate data=work.have;
   class score;
   table score='' all,n='No_customers' pctn='PCT_From_Total'*f=pctfmt. / box='score';
run;
hashman
Ammonite | Level 13

@Ronein:

 

On second thought, if your score values are limited-range integers (say, from 0 to 100, as assumed below) and you would like to do everything in a single step, you can use a key-indexed table:

 

data want (keep = score n p rename=(n=No_customers p=PCT_From_Total)) ;
  array f [0:100] _temporary_ ;                                        
  do until (end) ;                                                     
    set have end = end ;                                               
    f [score] + 1 ;                                                    
  end ;                                                                
  t = sum (of f [*]) ;                                                 
  do score = lbound (f) to hbound (f) ;                                
    n = f [score] ;                                                    
    if nmiss (n) then continue ;                                       
    p = divide (n, t) ;                                                
    tp + p ;                                                           
    output ;                                                           
  end ;                                                                
  score = . ;                                                          
  n = t ;                                                              
  p = tp ;                                                             
  output ;                                                             
  format p percent7.1 ;                                                
run ;                                                                  

However, if your score values are arbitrary-type, arbitrary-range, you still can do it in one step, but then the key-indexed table needs to be replaced with a hash table. It makes code somewhat lengthier but also makes no assumptions about data:

 

data want (keep = score n p rename=(n=No_customers p=PCT_From_Total)) ;
  dcl hash h () ;                                                      
  h.definekey ("score") ;                                              
  h.definedata ("score", "n") ;                                        
  h.definedone () ;                                                    
  do until (end) ;                                                     
    set have end = end ;                                               
    if h.find() ne 0 then n = 1 ;                                      
    else                  n + 1 ;                                      
    h.replace() ;                                                      
  end ;                                                                
  dcl hiter i ("h") ;                                                  
  do while (i.next() = 0) ;                                            
    t + n ;                                                            
  end ;                                                                
  do while (i.next() = 0) ;                                            
    p = divide (n, t) ;                                                
    output ;                                                           
    tp + p ;                                                           
  end ;                                                                
  score = . ;                                                          
  n = t ;                                                              
  p = tp ;                                                             
  output ;                                                             
  format p percent7.1 ;                                                
run ;                                                                  

By the way, I do like the TABULATE solution by @andreas_lds (and like this proc in general) but think that you need an output data set.

 

Paul D.

ballardw
Super User

@hashman wrote:

 

By the way, I do like the TABULATE solution by @andreas_lds (and like this proc in general) but think that you need an output data set.

 

Paul D.


Proc Tabulate will create output data sets, just add Out=datasetname to the Proc statement. However the structure of the output set doesn't look like what you might expect (at least not until you have used it a bit) with table indicators, All summary indicators, variables having statistics appended to the names like the Autoname option in proc means/summary, a different format of the _type_ variable and such. I do have some use for the tabulate output sets so get to dig into these. Often a data step is needed for some post processing though.

hashman
Ammonite | Level 13

@ballardw

 

But of course you're right. What I meant is that @andreas_lds didn't offer that provision. And your points about the TABULATE output data set format ought to be well taken. But even if the proc spitted it out cleanly, some post-processing would be needed, anyway (e.g., to set the score to null for the totals). The TABULATE output data set can still be quite useful under some scenarios, post-processing notwithstanding, since the proc can do quite a bit of rather convoluted aggregation in a few lines of code (and do it right, too).

 

Best

Paul D.   

PaigeMiller
Diamond | Level 26

Use the right tool! PROC FREQ is the simplest way to get counts and percentages.

--
Paige Miller
sasuser123123
Quartz | Level 8
And I've doubt ...
I've two variables TRT1,TRT2 those has values like 45(23),55(56) after concatinating count and percentage and I want to add new variable total which should contains total of count and percentage like 100(79)... Can you please tell me how to create
PaigeMiller
Diamond | Level 26

Please explain in detail. 

 

Show us (a portion of) your input data and the desired output. 

--
Paige Miller
sasuser123123
Quartz | Level 8
I've variables age ,sex ,TRT(contains trt1 and trt2) in a datase, now I want to calculate count and percentage of variables Region with respect to TRT and sex with respect to TRT..so I did that by using proc freq.I got output like...(after concatenate the count and percentage and also transpose the data)


Region. Trt1. Trt2
India. 23(28). 22(22)
Usa. 22(22). 23(28)

So no need the another variable Total after Trt2 that should contain total of trt1 and trt2 along with percentage (like For india Total should be 45(50) and for usa 45(50)..




PaigeMiller
Diamond | Level 26

You haven't shown us the input data. Please provide a portion of (not necessarily all of) your data, via these instructions: https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Data should be in the form above, not any other form.

 

We can't really provide code until we know what the input data looks like.

--
Paige Miller
sasuser123123
Quartz | Level 8
Hello, sorry for the mistake in above question..
So now I need the another variable Total after Trt2 that should contain total of trt1 and trt2 along with percentage (like For india Total should be 45(50) and for usa 45(50).

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 10946 views
  • 0 likes
  • 6 in conversation