Variable Variable1 | |||||||||||||||||||||||||||||||||||||||||||||
|
Hello everyone,
I’m trying to get the percentages of Missing, Grade1, Grade2, Grade3 columns. I should bring the Variable column values as unique so I used GROUP BY option; I also need to bring ID column as singular that’s why I used DISTINCT option. Actually, I need to calculate percentages of Missing, Grade1, Grade2, Grade3 columns by grouping Variable column and getting ID column as Distinct. While I can get the Variable column as unique in Want dataset, I can’t get the Variable column as unique when I try to use Subquery in Want2 dataset. Please find my desired output below. Could you help me or show me other methods to get my desired output, please ?
DATA Have;
Length Variable $ 20 ID $ 10 Missing 8 Grade1 8 Grade2 8 Grade3 8;
Infile Datalines Missover ;
Input Variable ID Missing Grade1 Grade2 Grade3;
Datalines;
Variable1 0001 0 0 1 0
Variable1 0001 0 1 0 0
Variable1 0002 0 0 0 1
Variable1 0002 0 0 1 0
Variable1 0003 0 1 0 0
Variable1 0003 0 0 0 1
Variable2 0001 0 1 0 0
Variable2 0001 0 1 0 0
Variable2 0002 0 0 1 0
Variable2 0002 0 0 0 1
Variable2 0003 0 0 0 1
Variable2 0003 0 0 1 0
Variable3 0001 0 1 0 0
Variable3 0001 0 1 0 0
Variable3 0002 0 0 1 0
Variable3 0002 0 0 0 1
Variable3 0003 0 0 1 0
Variable3 0003 0 0 0 1
;
Run;
PROC SQL;
CREATE TABLE Want AS
SELECT Variable
,(SUM(Grade1)) AS Count1
/*,(SUM(Grade1)/100) FORMAT=percent7.1 as Percent1*/
,(SUM(Grade2)) AS Count2
/*,(SUM(Grade2)/100) FORMAT=percent7.1 as Percent2*/
,(SUM(Grade3)) AS Count3
/*,(SUM(Grade3)/100) FORMAT=percent7.1 as Percent3*/
,(SUM(Missing)) AS CountM
/*,(SUM(Missing)/100) FORMAT=percent7.1 as PercentM*/
,(COUNT(DISTINCT(ID))) AS CountT
/*,(COUNT(DISTINCT(ID))/100) FORMAT=percent7.1 as PercentT*/
FROM Have
GROUP BY Variable;
QUIT;
PROC SQL;
Create Table Want2 As
Select
Counts.*
/*H1.Variable
,Counts.Count1
,Counts.Count1/SUM(Counts.Count1) FORMAT=percent7.1 as Percent1
,Counts.Count2
,Counts.Count2/SUM(Counts.Count2) FORMAT=percent7.1 as Percent2
,Counts.Count3
,Counts.Count3/SUM(Counts.Count3) FORMAT=percent7.1 as Percent3
,Counts.CountM
,Counts.CountM/SUM(Counts.CountM) FORMAT=percent7.1 as PercentM
,Counts.CountT
,Counts.CountT/SUM(Counts.CountT) FORMAT=percent7.1 as PercentT*/
From Have H1,
(Select Variable
,SUM(Grade1) AS Count1
,SUM(Grade2) AS Count2
,SUM(Grade3) AS Count3
,SUM(Missing) AS CountM
,COUNT(DISTINCT(ID)) AS CountT
From Have
GROUP BY Variable) Counts;
QUIT;
Thank you.
Try with Proc tabulate
proc tabulate data=have;
class ID ;
var Missing Grade1 Grade2 Grade3;
Table ID, pctn* (Missing Grade1 Grade2 Grade3);
run;
Missing | Grade1 | Grade2 | Grade3 | |
---|---|---|---|---|
PctN | PctN | PctN | PctN | |
ID | 33.33 | 33.33 | 33.33 | 33.33 |
0001 | ||||
0002 | 33.33 | 33.33 | 33.33 | 33.33 |
0003 | 33.33 | 33.33 | 33.33 | 33.33 |
Hello @stevyfargose,
Thank you for trying to help me. But I need to see Variable and Count columns too. Is it possible to get by the help of PROC TABULATE. And I also need to get this results as SAS Dataset output.
Thank you.
Use out= for output.
Below table has three dimensions
proc tabulate data=have out=want;
Class Variable ID;
var Missing Grade1 Grade2 Grade3;
Table variable, ID , pctn*Missing Missing*n
pctn*Grade1 Grade1*n
pctn*Grade2 Grade2*n
pctn*Grade3 Grade3*n;
run;
Variable Variable1 | |||||||||||||||||||||||||||||||||||||||||||||
|
Variable Variable2 | |||||||||||||||||||||||||||||||||||||||||||||
|
Variable Variable3 | |||||||||||||||||||||||||||||||||||||||||||||
|
You can also create two dimensional table but the value of frequency(n) changes
proc tabulate data=have out=want;
Class Variable ID;
var Missing Grade1 Grade2 Grade3;
Table variable ID , pctn*Missing Missing*n
pctn*Grade1 Grade1*n
pctn*Grade2 Grade2*n
pctn*Grade3 Grade3*n;
run;
PctN | Missing | PctN | Grade1 | PctN | Grade2 | PctN | Grade3 | |
---|---|---|---|---|---|---|---|---|
Missing | N | Grade1 | N | Grade2 | N | Grade3 | N | |
Variable | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 |
Variable1 | ||||||||
Variable2 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 |
Variable3 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 |
ID | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 |
0001 | ||||||||
0002 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 |
0003 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 |
Hi again @stevyfargose ,
First of all, thank you for these information. Actually, I'm not familiar with PROC TABULATE however I can understand better due to your examples but it also doesn't seem to give my desired output. And I further want to see what is my mistake in Subquery statement which I used in Want2 data step.
Thanks.
I understood my mistake I was using freq statistics instead of sum, I am not sure if this is the exact answer you are looking but hope it helps
proc tabulate data=have out=want ;
Class Variable ;
var Missing Grade1 Grade2 Grade3;
Table variable, pctn*Missing Missing*sum
pctn*Grade1 Grade1*sum
pctn*Grade2 Grade2*sum
pctn*Grade3 Grade3*sum;
run;
PctN | Missing | PctN | Grade1 | PctN | Grade2 | PctN | Grade3 | |
---|---|---|---|---|---|---|---|---|
Missing | Sum | Grade1 | Sum | Grade2 | Sum | Grade3 | Sum | |
Variable | 33.33 | 0.00 | 33.33 | 2.00 | 33.33 | 2.00 | 33.33 | 2.00 |
Variable1 | ||||||||
Variable2 | 33.33 | 0.00 | 33.33 | 2.00 | 33.33 | 2.00 | 33.33 | 2.00 |
Variable3 | 33.33 | 0.00 | 33.33 | 2.00 | 33.33 | 2.00 | 33.33 | 2.00 |
Hello @stevyfargose,
You are insistent about PROC TABULATE 🙂 Thank you it is almost done. I need to Drop default columns and create CountT and PercentT columns
Thanks.
Proc tabulate is a good solution because you can calculate multiple statistics within a single proc. Otherwise you're most likely going to end up using several procs and then merging the results. If you want a dynamic result, this may be what you want anyways.
You can use SQL but then it becomes a bit more hard coded.
Proc freq will generate the first portion, counts and percentages, as well as missing.
For count distinct it's either a double proc freq or a proc sql.
You'll most likely need a proc transpose as well, to keep the dynamic nature of the report.
Assuming I understand what you mean.
DATA Have;
Length Variable $ 20 ID $ 10 Missing 8 Grade1 8 Grade2 8 Grade3 8;
Infile Datalines Missover ;
Input Variable ID Missing Grade1 Grade2 Grade3;
Datalines;
Variable1 0001 0 0 1 0
Variable1 0001 0 1 0 0
Variable1 0002 0 0 0 1
Variable1 0002 0 0 1 0
Variable1 0003 0 1 0 0
Variable1 0003 0 0 0 1
Variable2 0001 0 1 0 0
Variable2 0001 0 1 0 0
Variable2 0002 0 0 1 0
Variable2 0002 0 0 0 1
Variable2 0003 0 0 0 1
Variable2 0003 0 0 1 0
Variable3 0001 0 1 0 0
Variable3 0001 0 1 0 0
Variable3 0002 0 0 1 0
Variable3 0002 0 0 0 1
Variable3 0003 0 0 1 0
Variable3 0003 0 0 0 1
;
Run;
PROC SQL;
CREATE TABLE Want AS
SELECT Variable
,(SUM(Grade1)) AS Count1
,(SUM(Grade1)/(select SUM(Grade1) from Have)) FORMAT=percent7.1 as Percent1
,(SUM(Grade2)) AS Count2
,(SUM(Grade2)/(select SUM(Grade2) from Have)) FORMAT=percent7.1 as Percent2
,(SUM(Grade3)) AS Count3
,(SUM(Grade3)/(select SUM(Grade3) from Have)) FORMAT=percent7.1 as Percent3
,(SUM(Missing)) AS CountM
,(SUM(Missing)/(select SUM(Missing) from Have)) FORMAT=percent7.1 as PercentM
,(COUNT(DISTINCT(ID))) AS CountT
,(COUNT(DISTINCT(ID))/(select sum(n) from (select COUNT(DISTINCT(ID)) as n from Have group by Variable))) FORMAT=percent7.1 as PercentT
FROM Have
GROUP BY Variable;
QUIT;
DATA Have;
Length Variable $ 20 ID $ 10 Missing 8 Grade1 8 Grade2 8 Grade3 8;
Infile Datalines Missover ;
Input Variable ID Missing Grade1 Grade2 Grade3;
Datalines;
Variable1 0001 0 0 1 0
Variable1 0001 0 1 0 0
Variable1 0002 0 0 0 1
Variable1 0002 0 0 1 0
Variable1 0003 0 1 0 0
Variable1 0003 0 0 0 1
Variable2 0001 0 1 0 0
Variable2 0001 0 1 0 0
Variable2 0002 0 0 1 0
Variable2 0002 0 0 0 1
Variable2 0003 0 0 0 1
Variable2 0003 0 0 1 0
Variable3 0001 0 1 0 0
Variable3 0001 0 1 0 0
Variable3 0002 0 0 1 0
Variable3 0002 0 0 0 1
Variable3 0003 0 0 1 0
Variable3 0003 0 0 0 1
;
Run;
PROC SQL;
CREATE TABLE Want AS
SELECT have.Variable,
(SUM(Grade1)) AS Count1
,calculated Count1/Sum as Percent1 format=percent8.2
,(SUM(Grade2)) AS Count2
,calculated Count2/Sum as Percent2 format=percent8.2
,(SUM(Grade3)) AS Count3
,calculated Count3/Sum as Percent3 format=percent8.2
,(SUM(Missing)) AS CountM
,calculated CountM/Sum as PercentM format=percent8.2
,(COUNT(DISTINCT(ID))) AS CountT
,calculated CountT/Sum as PercentT format=percent8.2
FROM Have,
(Select variable, count(*) as sum from have
group by variable) as have2
where have.variable=have2.variable
GROUP BY have.Variable;
QUIT;
proc print data=want;
run;
Obs | Variable | Count1 | Percent1 | Count2 | Percent2 | Count3 | Percent3 | CountM | PercentM | CountT | PercentT |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Variable1 | 2 | 33.33% | 2 | 33.33% | 2 | 33.33% | 0 | 0.00% | 3 | 50.00% |
2 | Variable2 | 2 | 33.33% | 2 | 33.33% | 2 | 33.33% | 0 | 0.00% | 3 | 50.00% |
3 | Variable3 | 2 | 33.33% | 2 | 33.33% | 2 | 33.33% | 0 | 0.00% | 3 | 50.00% |
Hello @stevyfargose,
Total CountT is equal to 9 so we should divide it to 3. PercentT column should get the values as %33.33. If we fix the PercentT column we will reach the desired output result.
Thanks.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.