Variable Variable1 | |||||||||||||||||||||||||||||||||||||||||||||
|
Hello everyone,
I’m trying to get the percentages of Missing, Grade1, Grade2, Grade3 columns. I should bring the Variable column values as unique so I used GROUP BY option; I also need to bring ID column as singular that’s why I used DISTINCT option. Actually, I need to calculate percentages of Missing, Grade1, Grade2, Grade3 columns by grouping Variable column and getting ID column as Distinct. While I can get the Variable column as unique in Want dataset, I can’t get the Variable column as unique when I try to use Subquery in Want2 dataset. Please find my desired output below. Could you help me or show me other methods to get my desired output, please ?
DATA Have;
Length Variable $ 20 ID $ 10 Missing 8 Grade1 8 Grade2 8 Grade3 8;
Infile Datalines Missover ;
Input Variable ID Missing Grade1 Grade2 Grade3;
Datalines;
Variable1 0001 0 0 1 0
Variable1 0001 0 1 0 0
Variable1 0002 0 0 0 1
Variable1 0002 0 0 1 0
Variable1 0003 0 1 0 0
Variable1 0003 0 0 0 1
Variable2 0001 0 1 0 0
Variable2 0001 0 1 0 0
Variable2 0002 0 0 1 0
Variable2 0002 0 0 0 1
Variable2 0003 0 0 0 1
Variable2 0003 0 0 1 0
Variable3 0001 0 1 0 0
Variable3 0001 0 1 0 0
Variable3 0002 0 0 1 0
Variable3 0002 0 0 0 1
Variable3 0003 0 0 1 0
Variable3 0003 0 0 0 1
;
Run;
PROC SQL;
CREATE TABLE Want AS
SELECT Variable
,(SUM(Grade1)) AS Count1
/*,(SUM(Grade1)/100) FORMAT=percent7.1 as Percent1*/
,(SUM(Grade2)) AS Count2
/*,(SUM(Grade2)/100) FORMAT=percent7.1 as Percent2*/
,(SUM(Grade3)) AS Count3
/*,(SUM(Grade3)/100) FORMAT=percent7.1 as Percent3*/
,(SUM(Missing)) AS CountM
/*,(SUM(Missing)/100) FORMAT=percent7.1 as PercentM*/
,(COUNT(DISTINCT(ID))) AS CountT
/*,(COUNT(DISTINCT(ID))/100) FORMAT=percent7.1 as PercentT*/
FROM Have
GROUP BY Variable;
QUIT;
PROC SQL;
Create Table Want2 As
Select
Counts.*
/*H1.Variable
,Counts.Count1
,Counts.Count1/SUM(Counts.Count1) FORMAT=percent7.1 as Percent1
,Counts.Count2
,Counts.Count2/SUM(Counts.Count2) FORMAT=percent7.1 as Percent2
,Counts.Count3
,Counts.Count3/SUM(Counts.Count3) FORMAT=percent7.1 as Percent3
,Counts.CountM
,Counts.CountM/SUM(Counts.CountM) FORMAT=percent7.1 as PercentM
,Counts.CountT
,Counts.CountT/SUM(Counts.CountT) FORMAT=percent7.1 as PercentT*/
From Have H1,
(Select Variable
,SUM(Grade1) AS Count1
,SUM(Grade2) AS Count2
,SUM(Grade3) AS Count3
,SUM(Missing) AS CountM
,COUNT(DISTINCT(ID)) AS CountT
From Have
GROUP BY Variable) Counts;
QUIT;
Thank you.
Try with Proc tabulate
proc tabulate data=have;
class ID ;
var Missing Grade1 Grade2 Grade3;
Table ID, pctn* (Missing Grade1 Grade2 Grade3);
run;
Missing | Grade1 | Grade2 | Grade3 | |
---|---|---|---|---|
PctN | PctN | PctN | PctN | |
ID | 33.33 | 33.33 | 33.33 | 33.33 |
0001 | ||||
0002 | 33.33 | 33.33 | 33.33 | 33.33 |
0003 | 33.33 | 33.33 | 33.33 | 33.33 |
Hello @stevyfargose,
Thank you for trying to help me. But I need to see Variable and Count columns too. Is it possible to get by the help of PROC TABULATE. And I also need to get this results as SAS Dataset output.
Thank you.
Use out= for output.
Below table has three dimensions
proc tabulate data=have out=want;
Class Variable ID;
var Missing Grade1 Grade2 Grade3;
Table variable, ID , pctn*Missing Missing*n
pctn*Grade1 Grade1*n
pctn*Grade2 Grade2*n
pctn*Grade3 Grade3*n;
run;
Variable Variable1 | |||||||||||||||||||||||||||||||||||||||||||||
|
Variable Variable2 | |||||||||||||||||||||||||||||||||||||||||||||
|
Variable Variable3 | |||||||||||||||||||||||||||||||||||||||||||||
|
You can also create two dimensional table but the value of frequency(n) changes
proc tabulate data=have out=want;
Class Variable ID;
var Missing Grade1 Grade2 Grade3;
Table variable ID , pctn*Missing Missing*n
pctn*Grade1 Grade1*n
pctn*Grade2 Grade2*n
pctn*Grade3 Grade3*n;
run;
PctN | Missing | PctN | Grade1 | PctN | Grade2 | PctN | Grade3 | |
---|---|---|---|---|---|---|---|---|
Missing | N | Grade1 | N | Grade2 | N | Grade3 | N | |
Variable | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 |
Variable1 | ||||||||
Variable2 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 |
Variable3 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 |
ID | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 |
0001 | ||||||||
0002 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 |
0003 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 | 33.33 | 6 |
Hi again @stevyfargose ,
First of all, thank you for these information. Actually, I'm not familiar with PROC TABULATE however I can understand better due to your examples but it also doesn't seem to give my desired output. And I further want to see what is my mistake in Subquery statement which I used in Want2 data step.
Thanks.
I understood my mistake I was using freq statistics instead of sum, I am not sure if this is the exact answer you are looking but hope it helps
proc tabulate data=have out=want ;
Class Variable ;
var Missing Grade1 Grade2 Grade3;
Table variable, pctn*Missing Missing*sum
pctn*Grade1 Grade1*sum
pctn*Grade2 Grade2*sum
pctn*Grade3 Grade3*sum;
run;
PctN | Missing | PctN | Grade1 | PctN | Grade2 | PctN | Grade3 | |
---|---|---|---|---|---|---|---|---|
Missing | Sum | Grade1 | Sum | Grade2 | Sum | Grade3 | Sum | |
Variable | 33.33 | 0.00 | 33.33 | 2.00 | 33.33 | 2.00 | 33.33 | 2.00 |
Variable1 | ||||||||
Variable2 | 33.33 | 0.00 | 33.33 | 2.00 | 33.33 | 2.00 | 33.33 | 2.00 |
Variable3 | 33.33 | 0.00 | 33.33 | 2.00 | 33.33 | 2.00 | 33.33 | 2.00 |
Hello @stevyfargose,
You are insistent about PROC TABULATE 🙂 Thank you it is almost done. I need to Drop default columns and create CountT and PercentT columns
Thanks.
Proc tabulate is a good solution because you can calculate multiple statistics within a single proc. Otherwise you're most likely going to end up using several procs and then merging the results. If you want a dynamic result, this may be what you want anyways.
You can use SQL but then it becomes a bit more hard coded.
Proc freq will generate the first portion, counts and percentages, as well as missing.
For count distinct it's either a double proc freq or a proc sql.
You'll most likely need a proc transpose as well, to keep the dynamic nature of the report.
Assuming I understand what you mean.
DATA Have;
Length Variable $ 20 ID $ 10 Missing 8 Grade1 8 Grade2 8 Grade3 8;
Infile Datalines Missover ;
Input Variable ID Missing Grade1 Grade2 Grade3;
Datalines;
Variable1 0001 0 0 1 0
Variable1 0001 0 1 0 0
Variable1 0002 0 0 0 1
Variable1 0002 0 0 1 0
Variable1 0003 0 1 0 0
Variable1 0003 0 0 0 1
Variable2 0001 0 1 0 0
Variable2 0001 0 1 0 0
Variable2 0002 0 0 1 0
Variable2 0002 0 0 0 1
Variable2 0003 0 0 0 1
Variable2 0003 0 0 1 0
Variable3 0001 0 1 0 0
Variable3 0001 0 1 0 0
Variable3 0002 0 0 1 0
Variable3 0002 0 0 0 1
Variable3 0003 0 0 1 0
Variable3 0003 0 0 0 1
;
Run;
PROC SQL;
CREATE TABLE Want AS
SELECT Variable
,(SUM(Grade1)) AS Count1
,(SUM(Grade1)/(select SUM(Grade1) from Have)) FORMAT=percent7.1 as Percent1
,(SUM(Grade2)) AS Count2
,(SUM(Grade2)/(select SUM(Grade2) from Have)) FORMAT=percent7.1 as Percent2
,(SUM(Grade3)) AS Count3
,(SUM(Grade3)/(select SUM(Grade3) from Have)) FORMAT=percent7.1 as Percent3
,(SUM(Missing)) AS CountM
,(SUM(Missing)/(select SUM(Missing) from Have)) FORMAT=percent7.1 as PercentM
,(COUNT(DISTINCT(ID))) AS CountT
,(COUNT(DISTINCT(ID))/(select sum(n) from (select COUNT(DISTINCT(ID)) as n from Have group by Variable))) FORMAT=percent7.1 as PercentT
FROM Have
GROUP BY Variable;
QUIT;
DATA Have;
Length Variable $ 20 ID $ 10 Missing 8 Grade1 8 Grade2 8 Grade3 8;
Infile Datalines Missover ;
Input Variable ID Missing Grade1 Grade2 Grade3;
Datalines;
Variable1 0001 0 0 1 0
Variable1 0001 0 1 0 0
Variable1 0002 0 0 0 1
Variable1 0002 0 0 1 0
Variable1 0003 0 1 0 0
Variable1 0003 0 0 0 1
Variable2 0001 0 1 0 0
Variable2 0001 0 1 0 0
Variable2 0002 0 0 1 0
Variable2 0002 0 0 0 1
Variable2 0003 0 0 0 1
Variable2 0003 0 0 1 0
Variable3 0001 0 1 0 0
Variable3 0001 0 1 0 0
Variable3 0002 0 0 1 0
Variable3 0002 0 0 0 1
Variable3 0003 0 0 1 0
Variable3 0003 0 0 0 1
;
Run;
PROC SQL;
CREATE TABLE Want AS
SELECT have.Variable,
(SUM(Grade1)) AS Count1
,calculated Count1/Sum as Percent1 format=percent8.2
,(SUM(Grade2)) AS Count2
,calculated Count2/Sum as Percent2 format=percent8.2
,(SUM(Grade3)) AS Count3
,calculated Count3/Sum as Percent3 format=percent8.2
,(SUM(Missing)) AS CountM
,calculated CountM/Sum as PercentM format=percent8.2
,(COUNT(DISTINCT(ID))) AS CountT
,calculated CountT/Sum as PercentT format=percent8.2
FROM Have,
(Select variable, count(*) as sum from have
group by variable) as have2
where have.variable=have2.variable
GROUP BY have.Variable;
QUIT;
proc print data=want;
run;
Obs | Variable | Count1 | Percent1 | Count2 | Percent2 | Count3 | Percent3 | CountM | PercentM | CountT | PercentT |
---|---|---|---|---|---|---|---|---|---|---|---|
1 | Variable1 | 2 | 33.33% | 2 | 33.33% | 2 | 33.33% | 0 | 0.00% | 3 | 50.00% |
2 | Variable2 | 2 | 33.33% | 2 | 33.33% | 2 | 33.33% | 0 | 0.00% | 3 | 50.00% |
3 | Variable3 | 2 | 33.33% | 2 | 33.33% | 2 | 33.33% | 0 | 0.00% | 3 | 50.00% |
Hello @stevyfargose,
Total CountT is equal to 9 so we should divide it to 3. PercentT column should get the values as %33.33. If we fix the PercentT column we will reach the desired output result.
Thanks.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.