BookmarkSubscribeRSS Feed
Kivex
Calcite | Level 5

Hi there,

I am now learning proc sql procedure and have a question on counting and calculating percentages of subtotal.


Suppose I create one dataset looks like below:


groupgenderenrolledfinished
group 1MYY
group 1MYY
group 1FYY
group 1FYN
group 2MYY
group 2MYY
group 2MYY
group 2FYN
group 1FYN
group 1FYY
group 2MYY
group 2FYY

And I would like to get some summaries on counts and percentages based on groups and gender.

The output table looks like this:

statgroup 1group 2
enrolled66
finished4(66%)5(83%)
F2(50%)4(80%)
M2(50%)1(20%)

I just calculate the counts and percentages by hands but the basic ideas are the percentages of finished will be finished/enrolled, and the percentages in the gender are the proportions of finished people.

I would like to know how I can get the correct numbers and percentages using proc sql and count function in it. I also do not know whether I can transpose the output table in proc sql or not.

Any thought or explanations will help, thanks!

7 REPLIES 7
Steelers_In_DC
Barite | Level 11

There are a lot of different features that you can use to get your desired output.  I'm assuming you are not familiar with proc freq or proc transpose?  Is it necessary to only use proc sql;?

Kivex
Calcite | Level 5

Hi Mark,

I can use proc transpose and proc freq but sometimes I find it is hard to save the results from the procedure outputs and to change the sorting options. I think maybe there is a better way when using proc sql or using proc sql proc freq and proc transpose together. I just would like to find out an easy method to get the desired output. I think this is a typical example so I would like to know more, If you have any suggestions that would be helpful.

Thanks

Haris
Lapis Lazuli | Level 10

Here's something to get you going (not tested):

proc sql;

  select 'Enrolled' as Stat, sum(Enrolled='Y' AND Group='group 1') as Group1 'Group 1',

  sum(Enrolled='Y' AND Group='group 2') as Group1 'Group 2'

  from YourData

union

  select 'Finished' as Stat,

  cats( sum(Finished='Y' AND Group='group 1'),'(',sum(Finished='Y' AND Group='group 1')/sum(Group='group 1'),'%)' )

  cats( sum(Finished='Y' AND Group='group 2'),'(',sum(Finished='Y' AND Group='group 2')/sum(Group='group 2'),'%)' )

  from YourData;

quit;

You can also try processing with 'GROUP BY Group' clause but then your output will have two rows for Group 1 and Group 2 respectively--not columns.  You could transform the resulting table.

Kivex
Calcite | Level 5

I created a data set to test your codes and got several errors.

data test;

  input group $ gender $ Enrolled $ Finished $ @@;

  datalines;

  group1 M Y N

  group1 F Y N

  group1 M Y Y

  group1 F Y Y

  group1 F Y Y

  group1 M Y Y

  group1 M Y N

  group1 M Y N

  group1 F Y Y

  group1 M Y Y

  group2 F Y N

  group2 M Y Y

  group2 F Y Y

  group2 M Y Y

  group2 F Y Y

  group2 M Y N

  group2 M Y N

  group2 F Y Y

  group2 F Y Y

  group2 M Y Y

  ;

run;

Reeza
Super User

Your output doesn't appear to match your data. How are you calculating enrolled, finished, etc?

This gives you a quick test output:

proc tabulate data=test out=want;

class group gender enrolled finished;

table enrolled finished gender , group=''*(N='N'*f=8. colpctn='%'*f=8.);

run;

Haris
Lapis Lazuli | Level 10

Here's an updated and tested SQL code to get what you want:

proc sql;

  select 'Enrolled' as Stat,

  put(sum(Enrolled='Y' AND Group='group1'),8.) as Group1 'Group 1',

  put(sum(Enrolled='Y' AND Group='group2'),8.) as Group1 'Group 2'

  from Test

union

  select 'Finished' as Stat,

  cats( sum(Finished='Y' AND Group='group1'),' (',sum(Finished='Y' AND Group='group1')/sum(Group='group1')*100,'%)' ),

  cats( sum(Finished='Y' AND Group='group2'),' (',sum(Finished='Y' AND Group='group2')/sum(Group='group2')*100,'%)' )

  from Test

union

  select 'Gender=M' as Stat,

  cats( sum(Gender='M' AND Group='group1'),' (',sum(Gender='M' AND Group='group1')/sum(Group='group1')*100,'%)' ),

  cats( sum(Gender='M' AND Group='group2'),' (',sum(Gender='M' AND Group='group2')/sum(Group='group2')*100,'%)' )

  from Test

union

  select 'Gender=F' as Stat,

  cats( sum(Gender='F' AND Group='group1'),' (',sum(Gender='F' AND Group='group1')/sum(Group='group1')*100,'%)' ),

  cats( sum(Gender='F' AND Group='group2'),' (',sum(Gender='F' AND Group='group2')/sum(Group='group2')*100,'%)' )

  from Test

;

quit;

hbi
Quartz | Level 8 hbi
Quartz | Level 8

Hello,

Below is a pure PROC SQL solution to your question. The output would look like this:

    enrolled-finished-M-F.gif

As others have alluded to, there are most certainly many other ways to tackle this problem. That being said, these series of SQL statements run efficiently ... potentially more efficiently than using several separate PROC FREQ statements that generate datasets using the OUT option (for which the output would still have to be reassembled).

PROC SQL;

  CREATE TABLE example_data (

       group      VARCHAR(10)

     , gender     VARCHAR(1)

     , enrolled   VARCHAR(1)

     , finished   VARCHAR(1)

  );

QUIT;

/* same as your data */

PROC SQL;

INSERT INTO example_data values ('group 1', 'M', 'Y', 'Y');

INSERT INTO example_data values ('group 1', 'M', 'Y', 'Y');

INSERT INTO example_data values ('group 1', 'F', 'Y', 'Y');

INSERT INTO example_data values ('group 1', 'F', 'Y', 'N');

INSERT INTO example_data values ('group 2', 'M', 'Y', 'Y');

INSERT INTO example_data values ('group 2', 'M', 'Y', 'Y');

INSERT INTO example_data values ('group 2', 'M', 'Y', 'Y');

INSERT INTO example_data values ('group 2', 'F', 'Y', 'N');

INSERT INTO example_data values ('group 1', 'F', 'Y', 'N');

INSERT INTO example_data values ('group 1', 'F', 'Y', 'Y');

INSERT INTO example_data values ('group 2', 'M', 'Y', 'Y');

INSERT INTO example_data values ('group 2', 'F', 'Y', 'Y');

QUIT;

/* create one record table similar to Oracle's SYS.DUAL table */

/* this handy technique allows us to select macro variables within a SQL statement out of thin air */

DATA dummy_row; dummy = 'X'; RUN;

/* set up macro variables for later use */

PROC SQL;

  SELECT SUM(CASE WHEN enrolled = 'Y' AND group = 'group 1' THEN 1 ELSE 0 END) AS G1

       , SUM(CASE WHEN enrolled = 'Y' AND group = 'group 2' THEN 1 ELSE 0 END) AS G2

  INTO :enrolled_g1, :enrolled_g2

  FROM example_data;

  SELECT SUM(CASE WHEN finished = 'Y' AND group = 'group 1' THEN 1 ELSE 0 END) AS G1

       , SUM(CASE WHEN finished = 'Y' AND group = 'group 2' THEN 1 ELSE 0 END) AS G2

  INTO :finished_g1, :finished_g2

  FROM example_data;

QUIT;

/* use "UNION ALL" because using "UNION" will impact sort position */

PROC SQL;

  CREATE TABLE example_summary AS

  SELECT "enrolled" length=20 AS stat_label

       , &enrolled_g1         AS G1

       , . format=percent8.2  AS G1_ratio

       , &enrolled_g2         AS G2

       , . format=percent8.2  AS G2_ratio

  FROM dummy_row

  UNION ALL SELECT "finished" AS stat_label

       , &finished_g1 AS G1

       , (&finished_g1/&enrolled_g1) AS G1_ratio

       , &finished_g2 AS G2

       , (&finished_g2/&enrolled_g2) AS G2_ratio

  FROM dummy_row

  UNION ALL SELECT 'M' AS stat_label

       , SUM(CASE WHEN gender = 'M' AND group = 'group 1' THEN 1 ELSE 0 END) AS G1

       , (CALCULATED G1/&finished_g1) AS G1_ratio

       , SUM(CASE WHEN gender = 'M' AND group = 'group 2' THEN 1 ELSE 0 END) AS G2

       , (CALCULATED G2/&finished_g2) AS G2_ratio

  FROM example_data

  UNION ALL SELECT 'F' AS stat_label

       , SUM(CASE WHEN gender = 'F' AND group = 'group 1' THEN 1 ELSE 0 END) AS G1

       , (CALCULATED G1/&finished_g1) AS G1_ratio

       , SUM(CASE WHEN gender = 'F' AND group = 'group 2' THEN 1 ELSE 0 END) AS G2

       , (CALCULATED G2/&finished_g2) AS G2_ratio

  FROM example_data;

QUIT;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 7127 views
  • 0 likes
  • 5 in conversation