Re: Subtotal count and percentage using proc sql

Kivex · Posted 08-27-2015 01:31 PM

Hi there,

I am now learning proc sql procedure and have a question on counting and calculating percentages of subtotal.

Suppose I create one dataset looks like below:

group	gender	enrolled	finished
group 1	M	Y	Y
group 1	M	Y	Y
group 1	F	Y	Y
group 1	F	Y	N
group 2	M	Y	Y
group 2	M	Y	Y
group 2	M	Y	Y
group 2	F	Y	N
group 1	F	Y	N
group 1	F	Y	Y
group 2	M	Y	Y
group 2	F	Y	Y

And I would like to get some summaries on counts and percentages based on groups and gender.

The output table looks like this:

stat	group 1	group 2
enrolled	6	6
finished	4(66%)	5(83%)
F	2(50%)	4(80%)
M	2(50%)	1(20%)

I just calculate the counts and percentages by hands but the basic ideas are the percentages of finished will be finished/enrolled, and the percentages in the gender are the proportions of finished people.

I would like to know how I can get the correct numbers and percentages using proc sql and count function in it. I also do not know whether I can transpose the output table in proc sql or not.

Any thought or explanations will help, thanks!

Steelers_In_DC · Posted 08-27-2015 03:38 PM

There are a lot of different features that you can use to get your desired output. I'm assuming you are not familiar with proc freq or proc transpose? Is it necessary to only use proc sql;?

Kivex · Posted 08-27-2015 04:10 PM

Hi Mark,

I can use proc transpose and proc freq but sometimes I find it is hard to save the results from the procedure outputs and to change the sorting options. I think maybe there is a better way when using proc sql or using proc sql proc freq and proc transpose together. I just would like to find out an easy method to get the desired output. I think this is a typical example so I would like to know more, If you have any suggestions that would be helpful.

Thanks

Haris · Posted 08-27-2015 05:33 PM

Here's something to get you going (not tested):

proc sql;

select 'Enrolled' as Stat, sum(Enrolled='Y' AND Group='group 1') as Group1 'Group 1',

sum(Enrolled='Y' AND Group='group 2') as Group1 'Group 2'

from YourData

union

select 'Finished' as Stat,

cats( sum(Finished='Y' AND Group='group 1'),'(',sum(Finished='Y' AND Group='group 1')/sum(Group='group 1'),'%)' )

cats( sum(Finished='Y' AND Group='group 2'),'(',sum(Finished='Y' AND Group='group 2')/sum(Group='group 2'),'%)' )

from YourData;

quit;

You can also try processing with 'GROUP BY Group' clause but then your output will have two rows for Group 1 and Group 2 respectively--not columns. You could transform the resulting table.

Kivex · Posted 08-28-2015 11:09 AM

I created a data set to test your codes and got several errors.

data test;

input group $ gender $ Enrolled $ Finished $ @@;

datalines;

group1 M Y N

group1 F Y N

group1 M Y Y

group1 F Y Y

group1 M Y Y

group1 M Y N

group1 F Y Y

group1 M Y Y

group2 F Y N

group2 M Y Y

group2 F Y Y

group2 M Y Y

group2 F Y Y

group2 M Y N

group2 F Y Y

group2 M Y Y

;

run;

Reeza · Posted 08-28-2015 11:38 AM

Your output doesn't appear to match your data. How are you calculating enrolled, finished, etc?

This gives you a quick test output:

proc tabulate data=test out=want;

class group gender enrolled finished;

table enrolled finished gender , group=''*(N='N'*f=8. colpctn='%'*f=8.);

run;

Haris · Posted 08-28-2015 01:00 PM

Here's an updated and tested SQL code to get what you want:

proc sql;

select 'Enrolled' as Stat,

put(sum(Enrolled='Y' AND Group='group1'),8.) as Group1 'Group 1',

put(sum(Enrolled='Y' AND Group='group2'),8.) as Group1 'Group 2'

from Test

union

select 'Finished' as Stat,

cats( sum(Finished='Y' AND Group='group1'),' (',sum(Finished='Y' AND Group='group1')/sum(Group='group1')*100,'%)' ),

cats( sum(Finished='Y' AND Group='group2'),' (',sum(Finished='Y' AND Group='group2')/sum(Group='group2')*100,'%)' )

from Test

union

select 'Gender=M' as Stat,

cats( sum(Gender='M' AND Group='group1'),' (',sum(Gender='M' AND Group='group1')/sum(Group='group1')*100,'%)' ),

cats( sum(Gender='M' AND Group='group2'),' (',sum(Gender='M' AND Group='group2')/sum(Group='group2')*100,'%)' )

from Test

union

select 'Gender=F' as Stat,

cats( sum(Gender='F' AND Group='group1'),' (',sum(Gender='F' AND Group='group1')/sum(Group='group1')*100,'%)' ),

cats( sum(Gender='F' AND Group='group2'),' (',sum(Gender='F' AND Group='group2')/sum(Group='group2')*100,'%)' )

from Test

;

quit;

hbi · Posted 08-28-2015 01:08 PM

Hello,

Below is a pure PROC SQL solution to your question. The output would look like this:

As others have alluded to, there are most certainly many other ways to tackle this problem. That being said, these series of SQL statements run efficiently ... potentially more efficiently than using several separate PROC FREQ statements that generate datasets using the OUT option (for which the output would still have to be reassembled).

PROC SQL;

CREATE TABLE example_data (

group VARCHAR(10)

, gender VARCHAR(1)

, enrolled VARCHAR(1)

, finished VARCHAR(1)

);

QUIT;

/* same as your data */

PROC SQL;

INSERT INTO example_data values ('group 1', 'M', 'Y', 'Y');

INSERT INTO example_data values ('group 1', 'F', 'Y', 'Y');

INSERT INTO example_data values ('group 1', 'F', 'Y', 'N');

INSERT INTO example_data values ('group 2', 'M', 'Y', 'Y');

INSERT INTO example_data values ('group 2', 'F', 'Y', 'N');

INSERT INTO example_data values ('group 1', 'F', 'Y', 'N');

INSERT INTO example_data values ('group 1', 'F', 'Y', 'Y');

INSERT INTO example_data values ('group 2', 'M', 'Y', 'Y');

INSERT INTO example_data values ('group 2', 'F', 'Y', 'Y');

QUIT;

/* create one record table similar to Oracle's SYS.DUAL table */

/* this handy technique allows us to select macro variables within a SQL statement out of thin air */

DATA dummy_row; dummy = 'X'; RUN;

/* set up macro variables for later use */

PROC SQL;

SELECT SUM(CASE WHEN enrolled = 'Y' AND group = 'group 1' THEN 1 ELSE 0 END) AS G1

, SUM(CASE WHEN enrolled = 'Y' AND group = 'group 2' THEN 1 ELSE 0 END) AS G2

INTO :enrolled_g1, :enrolled_g2

FROM example_data;

SELECT SUM(CASE WHEN finished = 'Y' AND group = 'group 1' THEN 1 ELSE 0 END) AS G1

, SUM(CASE WHEN finished = 'Y' AND group = 'group 2' THEN 1 ELSE 0 END) AS G2

INTO :finished_g1, :finished_g2

FROM example_data;

QUIT;

/* use "UNION ALL" because using "UNION" will impact sort position */

PROC SQL;

CREATE TABLE example_summary AS

SELECT "enrolled" length=20 AS stat_label

, &enrolled_g1 AS G1

, . format=percent8.2 AS G1_ratio

, &enrolled_g2 AS G2

, . format=percent8.2 AS G2_ratio

FROM dummy_row

UNION ALL SELECT "finished" AS stat_label

, &finished_g1 AS G1

, (&finished_g1/&enrolled_g1) AS G1_ratio

, &finished_g2 AS G2

, (&finished_g2/&enrolled_g2) AS G2_ratio

FROM dummy_row

UNION ALL SELECT 'M' AS stat_label

, SUM(CASE WHEN gender = 'M' AND group = 'group 1' THEN 1 ELSE 0 END) AS G1

, (CALCULATED G1/&finished_g1) AS G1_ratio

, SUM(CASE WHEN gender = 'M' AND group = 'group 2' THEN 1 ELSE 0 END) AS G2

, (CALCULATED G2/&finished_g2) AS G2_ratio

FROM example_data

UNION ALL SELECT 'F' AS stat_label

, SUM(CASE WHEN gender = 'F' AND group = 'group 1' THEN 1 ELSE 0 END) AS G1

, (CALCULATED G1/&finished_g1) AS G1_ratio

, SUM(CASE WHEN gender = 'F' AND group = 'group 2' THEN 1 ELSE 0 END) AS G2

, (CALCULATED G2/&finished_g2) AS G2_ratio

FROM example_data;

QUIT;

Registration is open

SAS Training: Just a Click Away