How do I obtain the standard deviation of statistics calculated using SQL aggregate functions such as count().

Accepted Solution Solved
Reply
N/A
Posts: 1
Accepted Solution

How do I obtain the standard deviation of statistics calculated using SQL aggregate functions such as count().

Hello all,    

I can’t seem to wrap my mind around this and was wondering if anyone could help clarify things.  How do I calculate the standard deviation of calculations of SQL count data that was collapsed across observations and then how do I carry that through the calculations I’m executing in SQL?  This is hard to explain so I made an illustrative example and tried to describe clearly the calculation I’m doing and the standard deviation I’m looking for.  I have a hunch I’m overlooking some fundamental concept here. 

Let’s assume I have a cohort of 20 people (8 exposed, 12 unexposed) and I am interested in determining the proportion of people that had each of three procedures, stratified by exposure status.  I can only see claims for 13 people here, 7 had no procedure and so don’t appear in this data table. My eventual goal is to generate the usual Table 1 comparing exposed to unexposed and including a standardized difference (of procedure rates).  Unfortunately, I can't accomplish the first step of generating standard deviation estimates. 


My data is in the following format:

TABLE 1. DATA_SET_A

Procedure

Exposed

Subject

1

1

Person 1

1

1

Person 2

1

1

Person 3

1

1

Person 4

1

0

Person 5

1

0

Person 6

1

0

Person 7

1

0

Person 8

2

1

Person 1

2

1

Person 3

2

0

Person 6

2

0

Person 8

2

1

Person 9

2

1

Person 10

2

0

Person 11

2

1

Person 12

2

0

Person 13

3

0

Person 5

3

0

Person 8

3

1

Person 10

3

0

Person 11

3

0

Person 13

  1. First, I use PROC SQL to pull population counts (stratified by exposure) from my source population, which contains records for all people [STEP 1].
  2. Next, I use PROC SQL to collapse DATA_SET_A so each procedure is listed, for each exposure status, with the count [STEP 2].
  3. Finally, I use LEFT JOIN to link procedure/exposure numerators with their appropriate denominators and perform the proportion calculation [STEP 3].

Please refer to the SAS code I've pasted at the bottom of this post.

This code generates Table 2 (see below).  How do obtain the standard deviations for these proportion estimates (see ? in the table below)? I don’t know how to obtain the standard deviation of a count that I’ve collapsed in SQL.  I tried using aggregate functions such as std(count(*)) AS StdDev, but SAS didn’t like that.  Do I have to obtain the standard deviation during the count step and then carry it forward or can I somehow obtain it during my proportion step?  I know how to calculate the standard deviation if I transform my dataset such that each procedure is a variable instead of having a procedure variable that aligns with the counts.  The problem is that my real problem involves hundreds of procedures so I can't switch them all over manually.

TABLE 2. DATA_SET_B

Procedure

Exposed

Count

Denominator

Proportion

  1. Std. Dev.

1

1

4

8

4/8

  1. 0.50

?

1

0

4

12

4/12

  1. 0.33

?

2

1

5

8

5/8

  1. 0.63

?

2

0

4

12

4/12

  1. 0.33

?

3

1

1

8

1/8

  1. 0.13

?

3

0

4

12

4/12

  1. 0.33

?

Thanks for your help. Please be blunt if I’m missing something obvious. 



CODE BELOW:


*CALCULATING THE PROPORTION OF PATIENTS THAT WERE CODED FOR EACH PROCEDURE AT LEAST ONCE IN THE FOLLOW-UP PERIOD*;

STEP 1

      *COUNTS FOR DENOMENATORS*;

                        *COUNT UNIQUE PATIENTS*;

PROC SQL;

CREATE TABLE work.Denominators AS

      SELECT DISTINCT exposed, count(*) AS Denominator

      FROM (SELECT DISTINCT exposed, subject FROM work.SourcePopn)           

GROUP BY exposed;

QUIT;

STEP 2

*COUNT UP PROCEDURS FOR NUMERATOR*;

PROC SQL;

      CREATE TABLE work.Numerators AS

      SELECT DISTINCT exposed, procedure, count(*) AS count

      FROM (SELECT DISTINCT subject, procedure, exposed FROM work.DATA_SET_A)

      GROUP BY exposed, procedure

      ORDER BY exposed DESC, count DESC;

QUIT;

STEP 3

*THIS CODE ATTACHES THE DENOMENATOR AND DOES PROPORTION CALCULATION*;

PROC SQL;

      CREATE TABLE work.DATA_SET_B AS

SELECT a.procedure, a.exposed, a.count, b.denominator,

(a.count/b.denominator) AS Proportion

      FROM work.Numerators AS a LEFT JOIN work.Denominators as b

      ON a.exposed=b.exposed

      ORDER BY exposed DESC, count DESC;

RUN;


Accepted Solutions
Solution
‎11-21-2013 04:28 PM
Trusted Advisor
Posts: 1,615

Re: How do I obtain the standard deviation of statistics calculated using SQL aggregate functions such as count().

The standard deviation of a proportion is sqrt(p*(1-p)/n), where p is the estimated proportion; should be easy enough to compute from PROC SQL or otherwise.

View solution in original post


All Replies
Solution
‎11-21-2013 04:28 PM
Trusted Advisor
Posts: 1,615

Re: How do I obtain the standard deviation of statistics calculated using SQL aggregate functions such as count().

The standard deviation of a proportion is sqrt(p*(1-p)/n), where p is the estimated proportion; should be easy enough to compute from PROC SQL or otherwise.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 1 reply
  • 592 views
  • 0 likes
  • 2 in conversation