BookmarkSubscribeRSS Feed
turcay
Lapis Lazuli | Level 10

Hello everyone,

 

I’m trying to get the percentages of Missing, Grade1, Grade2, Grade3 columns. I should bring the Variable column values as unique so I used GROUP BY option; I also need to bring ID column as singular that’s why I used DISTINCT option. Actually, I need to calculate percentages of Missing, Grade1, Grade2, Grade3 columns by grouping Variable column and getting ID column as Distinct. While I can get the Variable column as unique in Want dataset, I can’t get the Variable column as unique when I try to use Subquery in Want2 dataset. Please find my desired output below. Could you help me or show me other methods to get my desired output, please ?

 

DATA Have;
Length Variable $ 20 ID $ 10 Missing 8 Grade1 8 Grade2 8 Grade3 8;
Infile Datalines Missover ;
Input Variable ID Missing Grade1 Grade2 Grade3;
Datalines;
Variable1 0001 0 0 1 0
Variable1 0001 0 1 0 0
Variable1 0002 0 0 0 1
Variable1 0002 0 0 1 0
Variable1 0003 0 1 0 0
Variable1 0003 0 0 0 1
Variable2 0001 0 1 0 0
Variable2 0001 0 1 0 0
Variable2 0002 0 0 1 0
Variable2 0002 0 0 0 1
Variable2 0003 0 0 0 1
Variable2 0003 0 0 1 0
Variable3 0001 0 1 0 0
Variable3 0001 0 1 0 0
Variable3 0002 0 0 1 0
Variable3 0002 0 0 0 1
Variable3 0003 0 0 1 0
Variable3 0003 0 0 0 1
;
Run;

PROC SQL;
   CREATE TABLE Want AS
   SELECT Variable
           ,(SUM(Grade1)) AS Count1 
           /*,(SUM(Grade1)/100) FORMAT=percent7.1 as Percent1*/
           ,(SUM(Grade2)) AS Count2 
           /*,(SUM(Grade2)/100) FORMAT=percent7.1 as Percent2*/
           ,(SUM(Grade3)) AS Count3 
           /*,(SUM(Grade3)/100) FORMAT=percent7.1 as Percent3*/
           ,(SUM(Missing)) AS CountM
           /*,(SUM(Missing)/100) FORMAT=percent7.1 as PercentM*/
           ,(COUNT(DISTINCT(ID))) AS CountT
           /*,(COUNT(DISTINCT(ID))/100) FORMAT=percent7.1 as PercentT*/
      FROM Have
      GROUP BY Variable;
QUIT;

PROC SQL;
Create Table Want2 As
Select 
Counts.*
/*H1.Variable
,Counts.Count1
,Counts.Count1/SUM(Counts.Count1) FORMAT=percent7.1 as Percent1
,Counts.Count2
,Counts.Count2/SUM(Counts.Count2) FORMAT=percent7.1 as Percent2
,Counts.Count3
,Counts.Count3/SUM(Counts.Count3) FORMAT=percent7.1 as Percent3

,Counts.CountM
,Counts.CountM/SUM(Counts.CountM) FORMAT=percent7.1 as PercentM
,Counts.CountT
,Counts.CountT/SUM(Counts.CountT) FORMAT=percent7.1 as PercentT*/
From Have H1,
	(Select Variable
	   ,SUM(Grade1) AS Count1 
	   ,SUM(Grade2) AS Count2
	   ,SUM(Grade3) AS Count3
	   ,SUM(Missing) AS CountM
	   ,COUNT(DISTINCT(ID)) AS CountT
	 From Have
	 GROUP BY Variable) Counts;
QUIT;

Desired.png

 

Thank you.

 

11 REPLIES 11
stevyfargose
Obsidian | Level 7

Try with Proc tabulate

 

proc tabulate data=have;
 class ID ;
 var Missing Grade1 Grade2 Grade3;
 Table ID, pctn* (Missing Grade1 Grade2 Grade3);
  run; 

  Missing Grade1 Grade2 Grade3
PctN PctN PctN PctN
ID 33.33 33.33 33.33 33.33
0001
0002 33.33 33.33 33.33 33.33
0003 33.33 33.33 33.33 33.33
turcay
Lapis Lazuli | Level 10

Hello @stevyfargose,

 

Thank you for trying to help me. But I need to see Variable and Count columns too. Is it possible to get by the help of PROC TABULATE. And I also need to get this results as SAS Dataset output.

 

Thank you.

 

 

stevyfargose
Obsidian | Level 7

Use out= for output.

 

Below table has three dimensions

 

proc tabulate data=have out=want;
 Class Variable ID;
 var Missing Grade1 Grade2 Grade3;
  Table variable, ID , pctn*Missing Missing*n
                       pctn*Grade1 Grade1*n
                       pctn*Grade2 Grade2*n
                       pctn*Grade3 Grade3*n;
  run; 

 

Variable Variable1
  PctN Missing PctN Grade1 PctN Grade2 PctN Grade3
Missing N Grade1 N Grade2 N Grade3 N
ID 11.11 2 11.11 2 11.11 2 11.11 2
0001
0002 11.11 2 11.11 2 11.11 2 11.11 2
0003 11.11 2 11.11 2 11.11 2 11.11 2
 
Variable Variable2
  PctN Missing PctN Grade1 PctN Grade2 PctN Grade3
Missing N Grade1 N Grade2 N Grade3 N
ID 11.11 2 11.11 2 11.11 2 11.11 2
0001
0002 11.11 2 11.11 2 11.11 2 11.11 2
0003 11.11 2 11.11 2 11.11 2 11.11 2
 
Variable Variable3
  PctN Missing PctN Grade1 PctN Grade2 PctN Grade3
Missing N Grade1 N Grade2 N Grade3 N
ID 11.11 2 11.11 2 11.11 2 11.11 2
0001
0002 11.11 2 11.11 2 11.11 2 11.11 2
0003 11.11 2 11.11 2 11.11 2 11.11 2
stevyfargose
Obsidian | Level 7

You can also create two dimensional table but the value of frequency(n) changes

 

proc tabulate data=have out=want;
 Class Variable ID;
 var Missing Grade1 Grade2 Grade3;
  Table variable ID , pctn*Missing Missing*n
                       pctn*Grade1 Grade1*n
                       pctn*Grade2 Grade2*n
                       pctn*Grade3 Grade3*n;
  run; 

 

  PctN Missing PctN Grade1 PctN Grade2 PctN Grade3
Missing N Grade1 N Grade2 N Grade3 N
Variable 33.33 6 33.33 6 33.33 6 33.33 6
Variable1
Variable2 33.33 6 33.33 6 33.33 6 33.33 6
Variable3 33.33 6 33.33 6 33.33 6 33.33 6
ID 33.33 6 33.33 6 33.33 6 33.33 6
0001
0002 33.33 6 33.33 6 33.33 6 33.33 6
0003 33.33 6 33.33 6 33.33 6 33.33 6
turcay
Lapis Lazuli | Level 10

Hi again @stevyfargose ,

 

First of all, thank you for these information. Actually, I'm not familiar with PROC TABULATE however I can understand better due to your examples but it also doesn't seem to give my desired output. And I further want to see what is my mistake in Subquery statement which I used in Want2 data step.

 

Thanks.

stevyfargose
Obsidian | Level 7

I understood my mistake I was using freq statistics instead of sum, I am not sure if this is the exact answer you are looking but hope it helps

 

proc tabulate data=have out=want ;
 Class Variable  ;
 var Missing   Grade1 Grade2 Grade3;
  Table variable, pctn*Missing  Missing*sum
                       pctn*Grade1  Grade1*sum
                       pctn*Grade2  Grade2*sum
                       pctn*Grade3  Grade3*sum;
  run;

 

  PctN Missing PctN Grade1 PctN Grade2 PctN Grade3
Missing Sum Grade1 Sum Grade2 Sum Grade3 Sum
Variable 33.33 0.00 33.33 2.00 33.33 2.00 33.33 2.00
Variable1
Variable2 33.33 0.00 33.33 2.00 33.33 2.00 33.33 2.00
Variable3 33.33 0.00 33.33 2.00 33.33 2.00 33.33 2.00
turcay
Lapis Lazuli | Level 10

Hello @stevyfargose,

 

You are insistent about PROC TABULATE 🙂  Thank you it is almost done. I need to Drop default columns and create CountT and PercentT columns

 

Thanks.

Reeza
Super User

Proc tabulate is a good solution because you can calculate multiple statistics within a single proc. Otherwise you're most likely going to end up using several procs and then merging the results. If you want a dynamic result, this may be what you want anyways.

 

You can use SQL but then it becomes a bit more hard coded.

 

Proc freq will generate the first portion, counts and percentages, as well as missing. 

For count distinct it's either a double proc freq or a proc sql.  

You'll most likely need a proc transpose as well, to keep the dynamic nature of the report.

 

Ksharp
Super User

Assuming I understand what you mean.

 

DATA Have;
Length Variable $ 20 ID $ 10 Missing 8 Grade1 8 Grade2 8 Grade3 8;
Infile Datalines Missover ;
Input Variable ID Missing Grade1 Grade2 Grade3;
Datalines;
Variable1 0001 0 0 1 0
Variable1 0001 0 1 0 0
Variable1 0002 0 0 0 1
Variable1 0002 0 0 1 0
Variable1 0003 0 1 0 0
Variable1 0003 0 0 0 1
Variable2 0001 0 1 0 0
Variable2 0001 0 1 0 0
Variable2 0002 0 0 1 0
Variable2 0002 0 0 0 1
Variable2 0003 0 0 0 1
Variable2 0003 0 0 1 0
Variable3 0001 0 1 0 0
Variable3 0001 0 1 0 0
Variable3 0002 0 0 1 0
Variable3 0002 0 0 0 1
Variable3 0003 0 0 1 0
Variable3 0003 0 0 0 1
;
Run;


PROC SQL;
   CREATE TABLE Want AS
   SELECT Variable
           ,(SUM(Grade1)) AS Count1 
           ,(SUM(Grade1)/(select SUM(Grade1) from Have)) FORMAT=percent7.1 as Percent1
           ,(SUM(Grade2)) AS Count2 
           ,(SUM(Grade2)/(select SUM(Grade2) from Have))  FORMAT=percent7.1 as Percent2
           ,(SUM(Grade3)) AS Count3 
           ,(SUM(Grade3)/(select SUM(Grade3) from Have))  FORMAT=percent7.1 as Percent3
           ,(SUM(Missing)) AS CountM
           ,(SUM(Missing)/(select SUM(Missing) from Have))  FORMAT=percent7.1 as PercentM
           ,(COUNT(DISTINCT(ID))) AS CountT
           ,(COUNT(DISTINCT(ID))/(select sum(n) from (select COUNT(DISTINCT(ID)) as n from Have group by Variable))) FORMAT=percent7.1 as PercentT
      FROM Have
      GROUP BY Variable;
QUIT;
stevyfargose
Obsidian | Level 7

DATA Have;
Length Variable $ 20 ID $ 10 Missing 8 Grade1 8 Grade2 8 Grade3 8;
Infile Datalines Missover ;
Input Variable ID Missing Grade1 Grade2 Grade3;
Datalines;
Variable1 0001 0 0 1 0
Variable1 0001 0 1 0 0
Variable1 0002 0 0 0 1
Variable1 0002 0 0 1 0
Variable1 0003 0 1 0 0
Variable1 0003 0 0 0 1
Variable2 0001 0 1 0 0
Variable2 0001 0 1 0 0
Variable2 0002 0 0 1 0
Variable2 0002 0 0 0 1
Variable2 0003 0 0 0 1
Variable2 0003 0 0 1 0
Variable3 0001 0 1 0 0
Variable3 0001 0 1 0 0
Variable3 0002 0 0 1 0
Variable3 0002 0 0 0 1
Variable3 0003 0 0 1 0
Variable3 0003 0 0 0 1
;
Run;

PROC SQL;
   CREATE TABLE Want AS
   SELECT have.Variable,
           (SUM(Grade1)) AS Count1
           ,calculated Count1/Sum as Percent1 format=percent8.2
           ,(SUM(Grade2)) AS Count2
           ,calculated Count2/Sum as Percent2 format=percent8.2
           ,(SUM(Grade3)) AS Count3
            ,calculated Count3/Sum as Percent3 format=percent8.2
           ,(SUM(Missing)) AS CountM
            ,calculated CountM/Sum as PercentM format=percent8.2
           ,(COUNT(DISTINCT(ID))) AS CountT
            ,calculated CountT/Sum as PercentT format=percent8.2
      FROM Have,
       (Select variable, count(*) as sum from have
       group by variable) as have2
     where have.variable=have2.variable
      GROUP BY have.Variable;
QUIT;
proc print data=want;
run;

Obs Variable Count1 Percent1 Count2 Percent2 Count3 Percent3 CountM PercentM CountT PercentT
1 Variable1 2 33.33% 2 33.33% 2 33.33% 0 0.00% 3 50.00%
2 Variable2 2 33.33% 2 33.33% 2 33.33% 0 0.00% 3 50.00%
3 Variable3 2 33.33% 2 33.33% 2 33.33% 0 0.00% 3 50.00%
turcay
Lapis Lazuli | Level 10

Hello @stevyfargose,

 

Total CountT is equal to 9 so we should divide it to 3. PercentT column should get the values as %33.33. If we fix the PercentT column we will reach the desired output result.

 

Thanks. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 2087 views
  • 7 likes
  • 4 in conversation