DATA Step, Macro, Functions and more

How to - Dynamically Get Percentage of Column by Grouping Another Column

Reply
Super Contributor
Posts: 381

How to - Dynamically Get Percentage of Column by Grouping Another Column

Hello everyone,

 

I’m trying to get the percentages of Missing, Grade1, Grade2, Grade3 columns. I should bring the Variable column values as unique so I used GROUP BY option; I also need to bring ID column as singular that’s why I used DISTINCT option. Actually, I need to calculate percentages of Missing, Grade1, Grade2, Grade3 columns by grouping Variable column and getting ID column as Distinct. While I can get the Variable column as unique in Want dataset, I can’t get the Variable column as unique when I try to use Subquery in Want2 dataset. Please find my desired output below. Could you help me or show me other methods to get my desired output, please ?

 

DATA Have;
Length Variable $ 20 ID $ 10 Missing 8 Grade1 8 Grade2 8 Grade3 8;
Infile Datalines Missover ;
Input Variable ID Missing Grade1 Grade2 Grade3;
Datalines;
Variable1 0001 0 0 1 0
Variable1 0001 0 1 0 0
Variable1 0002 0 0 0 1
Variable1 0002 0 0 1 0
Variable1 0003 0 1 0 0
Variable1 0003 0 0 0 1
Variable2 0001 0 1 0 0
Variable2 0001 0 1 0 0
Variable2 0002 0 0 1 0
Variable2 0002 0 0 0 1
Variable2 0003 0 0 0 1
Variable2 0003 0 0 1 0
Variable3 0001 0 1 0 0
Variable3 0001 0 1 0 0
Variable3 0002 0 0 1 0
Variable3 0002 0 0 0 1
Variable3 0003 0 0 1 0
Variable3 0003 0 0 0 1
;
Run;

PROC SQL;
   CREATE TABLE Want AS
   SELECT Variable
           ,(SUM(Grade1)) AS Count1 
           /*,(SUM(Grade1)/100) FORMAT=percent7.1 as Percent1*/
           ,(SUM(Grade2)) AS Count2 
           /*,(SUM(Grade2)/100) FORMAT=percent7.1 as Percent2*/
           ,(SUM(Grade3)) AS Count3 
           /*,(SUM(Grade3)/100) FORMAT=percent7.1 as Percent3*/
           ,(SUM(Missing)) AS CountM
           /*,(SUM(Missing)/100) FORMAT=percent7.1 as PercentM*/
           ,(COUNT(DISTINCT(ID))) AS CountT
           /*,(COUNT(DISTINCT(ID))/100) FORMAT=percent7.1 as PercentT*/
      FROM Have
      GROUP BY Variable;
QUIT;

PROC SQL;
Create Table Want2 As
Select 
Counts.*
/*H1.Variable
,Counts.Count1
,Counts.Count1/SUM(Counts.Count1) FORMAT=percent7.1 as Percent1
,Counts.Count2
,Counts.Count2/SUM(Counts.Count2) FORMAT=percent7.1 as Percent2
,Counts.Count3
,Counts.Count3/SUM(Counts.Count3) FORMAT=percent7.1 as Percent3

,Counts.CountM
,Counts.CountM/SUM(Counts.CountM) FORMAT=percent7.1 as PercentM
,Counts.CountT
,Counts.CountT/SUM(Counts.CountT) FORMAT=percent7.1 as PercentT*/
From Have H1,
	(Select Variable
	   ,SUM(Grade1) AS Count1 
	   ,SUM(Grade2) AS Count2
	   ,SUM(Grade3) AS Count3
	   ,SUM(Missing) AS CountM
	   ,COUNT(DISTINCT(ID)) AS CountT
	 From Have
	 GROUP BY Variable) Counts;
QUIT;

Desired.png

 

Thank you.

 

Occasional Contributor
Posts: 12

Re: How to - Dynamically Get Percentage of Column by Grouping Another Column

Try with Proc tabulate

 

proc tabulate data=have;
 class ID ;
 var Missing Grade1 Grade2 Grade3;
 Table ID, pctn* (Missing Grade1 Grade2 Grade3);
  run; 

  Missing Grade1 Grade2 Grade3
PctN PctN PctN PctN
ID 33.33 33.33 33.33 33.33
0001
0002 33.33 33.33 33.33 33.33
0003 33.33 33.33 33.33 33.33
Super Contributor
Posts: 381

Re: How to - Dynamically Get Percentage of Column by Grouping Another Column

Hello @stevyfargose,

 

Thank you for trying to help me. But I need to see Variable and Count columns too. Is it possible to get by the help of PROC TABULATE. And I also need to get this results as SAS Dataset output.

 

Thank you.

 

 

Occasional Contributor
Posts: 12

Re: How to - Dynamically Get Percentage of Column by Grouping Another Column

Use out= for output.

 

Below table has three dimensions

 

proc tabulate data=have out=want;
 Class Variable ID;
 var Missing Grade1 Grade2 Grade3;
  Table variable, ID , pctn*Missing Missing*n
                       pctn*Grade1 Grade1*n
                       pctn*Grade2 Grade2*n
                       pctn*Grade3 Grade3*n;
  run; 

 

Variable Variable1
  PctN Missing PctN Grade1 PctN Grade2 PctN Grade3
Missing N Grade1 N Grade2 N Grade3 N
ID 11.11 2 11.11 2 11.11 2 11.11 2
0001
0002 11.11 2 11.11 2 11.11 2 11.11 2
0003 11.11 2 11.11 2 11.11 2 11.11 2
 
Variable Variable2
  PctN Missing PctN Grade1 PctN Grade2 PctN Grade3
Missing N Grade1 N Grade2 N Grade3 N
ID 11.11 2 11.11 2 11.11 2 11.11 2
0001
0002 11.11 2 11.11 2 11.11 2 11.11 2
0003 11.11 2 11.11 2 11.11 2 11.11 2
 
Variable Variable3
  PctN Missing PctN Grade1 PctN Grade2 PctN Grade3
Missing N Grade1 N Grade2 N Grade3 N
ID 11.11 2 11.11 2 11.11 2 11.11 2
0001
0002 11.11 2 11.11 2 11.11 2 11.11 2
0003 11.11 2 11.11 2 11.11 2 11.11 2
Occasional Contributor
Posts: 12

Re: How to - Dynamically Get Percentage of Column by Grouping Another Column

[ Edited ]

You can also create two dimensional table but the value of frequency(n) changes

 

proc tabulate data=have out=want;
 Class Variable ID;
 var Missing Grade1 Grade2 Grade3;
  Table variable ID , pctn*Missing Missing*n
                       pctn*Grade1 Grade1*n
                       pctn*Grade2 Grade2*n
                       pctn*Grade3 Grade3*n;
  run; 

 

  PctN Missing PctN Grade1 PctN Grade2 PctN Grade3
Missing N Grade1 N Grade2 N Grade3 N
Variable 33.33 6 33.33 6 33.33 6 33.33 6
Variable1
Variable2 33.33 6 33.33 6 33.33 6 33.33 6
Variable3 33.33 6 33.33 6 33.33 6 33.33 6
ID 33.33 6 33.33 6 33.33 6 33.33 6
0001
0002 33.33 6 33.33 6 33.33 6 33.33 6
0003 33.33 6 33.33 6 33.33 6 33.33 6
Super Contributor
Posts: 381

Re: How to - Dynamically Get Percentage of Column by Grouping Another Column

Hi again @stevyfargose ,

 

First of all, thank you for these information. Actually, I'm not familiar with PROC TABULATE however I can understand better due to your examples but it also doesn't seem to give my desired output. And I further want to see what is my mistake in Subquery statement which I used in Want2 data step.

 

Thanks.

Occasional Contributor
Posts: 12

Re: How to - Dynamically Get Percentage of Column by Grouping Another Column

I understood my mistake I was using freq statistics instead of sum, I am not sure if this is the exact answer you are looking but hope it helps

 

proc tabulate data=have out=want ;
 Class Variable  ;
 var Missing   Grade1 Grade2 Grade3;
  Table variable, pctn*Missing  Missing*sum
                       pctn*Grade1  Grade1*sum
                       pctn*Grade2  Grade2*sum
                       pctn*Grade3  Grade3*sum;
  run;

 

  PctN Missing PctN Grade1 PctN Grade2 PctN Grade3
Missing Sum Grade1 Sum Grade2 Sum Grade3 Sum
Variable 33.33 0.00 33.33 2.00 33.33 2.00 33.33 2.00
Variable1
Variable2 33.33 0.00 33.33 2.00 33.33 2.00 33.33 2.00
Variable3 33.33 0.00 33.33 2.00 33.33 2.00 33.33 2.00
Super Contributor
Posts: 381

Re: How to - Dynamically Get Percentage of Column by Grouping Another Column

Hello @stevyfargose,

 

You are insistent about PROC TABULATE Smiley Happy  Thank you it is almost done. I need to Drop default columns and create CountT and PercentT columns

 

Thanks.

Super User
Posts: 17,775

Re: How to - Dynamically Get Percentage of Column by Grouping Another Column

Proc tabulate is a good solution because you can calculate multiple statistics within a single proc. Otherwise you're most likely going to end up using several procs and then merging the results. If you want a dynamic result, this may be what you want anyways.

 

You can use SQL but then it becomes a bit more hard coded.

 

Proc freq will generate the first portion, counts and percentages, as well as missing. 

For count distinct it's either a double proc freq or a proc sql.  

You'll most likely need a proc transpose as well, to keep the dynamic nature of the report.

 

Super User
Posts: 9,671

Re: How to - Dynamically Get Percentage of Column by Grouping Another Column

Assuming I understand what you mean.

 

DATA Have;
Length Variable $ 20 ID $ 10 Missing 8 Grade1 8 Grade2 8 Grade3 8;
Infile Datalines Missover ;
Input Variable ID Missing Grade1 Grade2 Grade3;
Datalines;
Variable1 0001 0 0 1 0
Variable1 0001 0 1 0 0
Variable1 0002 0 0 0 1
Variable1 0002 0 0 1 0
Variable1 0003 0 1 0 0
Variable1 0003 0 0 0 1
Variable2 0001 0 1 0 0
Variable2 0001 0 1 0 0
Variable2 0002 0 0 1 0
Variable2 0002 0 0 0 1
Variable2 0003 0 0 0 1
Variable2 0003 0 0 1 0
Variable3 0001 0 1 0 0
Variable3 0001 0 1 0 0
Variable3 0002 0 0 1 0
Variable3 0002 0 0 0 1
Variable3 0003 0 0 1 0
Variable3 0003 0 0 0 1
;
Run;


PROC SQL;
   CREATE TABLE Want AS
   SELECT Variable
           ,(SUM(Grade1)) AS Count1 
           ,(SUM(Grade1)/(select SUM(Grade1) from Have)) FORMAT=percent7.1 as Percent1
           ,(SUM(Grade2)) AS Count2 
           ,(SUM(Grade2)/(select SUM(Grade2) from Have))  FORMAT=percent7.1 as Percent2
           ,(SUM(Grade3)) AS Count3 
           ,(SUM(Grade3)/(select SUM(Grade3) from Have))  FORMAT=percent7.1 as Percent3
           ,(SUM(Missing)) AS CountM
           ,(SUM(Missing)/(select SUM(Missing) from Have))  FORMAT=percent7.1 as PercentM
           ,(COUNT(DISTINCT(ID))) AS CountT
           ,(COUNT(DISTINCT(ID))/(select sum(n) from (select COUNT(DISTINCT(ID)) as n from Have group by Variable))) FORMAT=percent7.1 as PercentT
      FROM Have
      GROUP BY Variable;
QUIT;
Occasional Contributor
Posts: 12

Re: How to - Dynamically Get Percentage of Column by Grouping Another Column

DATA Have;
Length Variable $ 20 ID $ 10 Missing 8 Grade1 8 Grade2 8 Grade3 8;
Infile Datalines Missover ;
Input Variable ID Missing Grade1 Grade2 Grade3;
Datalines;
Variable1 0001 0 0 1 0
Variable1 0001 0 1 0 0
Variable1 0002 0 0 0 1
Variable1 0002 0 0 1 0
Variable1 0003 0 1 0 0
Variable1 0003 0 0 0 1
Variable2 0001 0 1 0 0
Variable2 0001 0 1 0 0
Variable2 0002 0 0 1 0
Variable2 0002 0 0 0 1
Variable2 0003 0 0 0 1
Variable2 0003 0 0 1 0
Variable3 0001 0 1 0 0
Variable3 0001 0 1 0 0
Variable3 0002 0 0 1 0
Variable3 0002 0 0 0 1
Variable3 0003 0 0 1 0
Variable3 0003 0 0 0 1
;
Run;

PROC SQL;
   CREATE TABLE Want AS
   SELECT have.Variable,
           (SUM(Grade1)) AS Count1
           ,calculated Count1/Sum as Percent1 format=percent8.2
           ,(SUM(Grade2)) AS Count2
           ,calculated Count2/Sum as Percent2 format=percent8.2
           ,(SUM(Grade3)) AS Count3
            ,calculated Count3/Sum as Percent3 format=percent8.2
           ,(SUM(Missing)) AS CountM
            ,calculated CountM/Sum as PercentM format=percent8.2
           ,(COUNT(DISTINCT(ID))) AS CountT
            ,calculated CountT/Sum as PercentT format=percent8.2
      FROM Have,
       (Select variable, count(*) as sum from have
       group by variable) as have2
     where have.variable=have2.variable
      GROUP BY have.Variable;
QUIT;
proc print data=want;
run;

Obs Variable Count1 Percent1 Count2 Percent2 Count3 Percent3 CountM PercentM CountT PercentT
1 Variable1 2 33.33% 2 33.33% 2 33.33% 0 0.00% 3 50.00%
2 Variable2 2 33.33% 2 33.33% 2 33.33% 0 0.00% 3 50.00%
3 Variable3 2 33.33% 2 33.33% 2 33.33% 0 0.00% 3 50.00%
Super Contributor
Posts: 381

Re: How to - Dynamically Get Percentage of Column by Grouping Another Column

Hello @stevyfargose,

 

Total CountT is equal to 9 so we should divide it to 3. PercentT column should get the values as %33.33. If we fix the PercentT column we will reach the desired output result.

 

Thanks. 

Ask a Question
Discussion stats
  • 11 replies
  • 371 views
  • 7 likes
  • 4 in conversation