Hello!
I have a dataset (simplified example) with only one variable.
Data have;
input class$;
A
A
A
B
B
C
C
C
C
D
I would like to calculate the percentages and counts of each variable in my dataset using PROC SQL.
Output would be similiar to proc freq:
Desired output should be ;
class |
Frequency |
Percent |
A |
3 |
30 |
B |
2 |
20 |
C |
4 |
40 |
D |
1 |
10 |
How can I do this with PROC SQL?
Thanks!
Hi @Chris_Loke I copy pasted from online for you 🙂
Data have;
input class$;
cards;
A
A
A
B
B
C
C
C
C
D
;
proc sql;
create table want as
select class,count(class) as frequency,calculated frequency/(select count(*) from have) as pct format=percent.
from have
group by class;
quit;
Hi @Chris_Loke I copy pasted from online for you 🙂
Data have;
input class$;
cards;
A
A
A
B
B
C
C
C
C
D
;
proc sql;
create table want as
select class,count(class) as frequency,calculated frequency/(select count(*) from have) as pct format=percent.
from have
group by class;
quit;
Thanks!!!
Hi,
How about like that:
proc sql;
select a.class, count(a.class) as countclass, (calculated countclass/b.all)*100 as percent
from
have as a, (select count(class) as all from have) as b
group by a.class
;
quit;
All the best
Bart
Why not use PROC FREQ which does the calculations by default, and doesn't have the drawback that missing values will cause the WRONG percentages to be calculated by PROC SQL (unless missing is a valid category). PROC FREQ gives you a choice how to handle missings, PROC SQL does not. PROC FREQ easily generalizes to two-way tables while PROC SQL does not. There are so many benefits to using PROC FREQ, I would not recommend PROC SQL for this purpose.
Agree with @PaigeMiller
Here is a quick example. I added missing values in the original data set and included the out= option to create new tables if you need them.
Data have;
infile datalines missover;
input class$;
datalines;
A
A
A
B
B
C
C
C
C
D
;
run;
/*Ignore missing in frequency. Will show how many are missing in the output table but won't count them in the percentage*/
title "Ignore Missing in Frequency";
proc freq data=have;
tables class /out=newtbl1(where=(class ne " ")) /*create a new table and remove the missing row from it*/
nocum /*remove cumulative freq*/;
run;
/*Include missing in frequency and will include missing values in the percentage*/
title "Include Missing in Frequency";
proc freq data=have;
tables class /out=newtbl2 /*create a new table*/
missing /*remove cumulative freq*/
nocum /*include missing values in frequency*/;
run;
title;
HTML Results
- Peter
Thank you for replying!
The thing is that I have ten different datasets for ten different years that I want to perform my counts on. I want join all the years togheter in to one table and the export it to excel.
If this is possible to do with Proc Freq I would be happy to learn it!
Both FREQ and SQL take only one input so it doesn't' matter which you use. Unless you create a view first and then run the SQL or FREQ on the view. Either way, FREQ would still be more efficient in terms of processing time, but you already have a solution so now it's most efficient to use that.
@Chris_Loke wrote:
Thank you for replying!
The thing is that I have ten different datasets for ten different years that I want to perform my counts on. I want join all the years togheter in to one table and the export it to excel.
If this is possible to do with Proc Freq I would be happy to learn it!
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.