Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- How do I obtain the standard deviation of statisti...

- RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-21-2013 03:22 PM

Hello all,

I can’t seem to wrap my mind around this and was wondering if anyone could help clarify things. How do I calculate the standard deviation of calculations of SQL count data that was collapsed across observations and then how do I carry that through the calculations I’m executing in SQL? This is hard to explain so I made an illustrative example and tried to describe clearly the calculation I’m doing and the standard deviation I’m looking for. I have a hunch I’m overlooking some fundamental concept here.

Let’s assume I have a cohort of 20 people (8 exposed, 12 unexposed) and I am interested in determining the proportion of people that had each of three procedures, stratified by exposure status. I can only see claims for 13 people here, 7 had no procedure and so don’t appear in this data table. My eventual goal is to generate the usual Table 1 comparing exposed to unexposed and including a standardized difference (of procedure rates). Unfortunately, I can't accomplish the first step of generating standard deviation estimates.

My data is in the following format:

**TABLE 1. DATA_SET_A**

Procedure | Exposed | Subject |

1 | 1 | Person 1 |

1 | 1 | Person 2 |

1 | 1 | Person 3 |

1 | 1 | Person 4 |

1 | 0 | Person 5 |

1 | 0 | Person 6 |

1 | 0 | Person 7 |

1 | 0 | Person 8 |

2 | 1 | Person 1 |

2 | 1 | Person 3 |

2 | 0 | Person 6 |

2 | 0 | Person 8 |

2 | 1 | Person 9 |

2 | 1 | Person 10 |

2 | 0 | Person 11 |

2 | 1 | Person 12 |

2 | 0 | Person 13 |

3 | 0 | Person 5 |

3 | 0 | Person 8 |

3 | 1 | Person 10 |

3 | 0 | Person 11 |

3 | 0 | Person 13 |

- First, I use PROC SQL to pull population counts (stratified by exposure) from my source population, which contains records for all people [STEP 1].
- Next, I use PROC SQL to collapse DATA_SET_A so each procedure is listed, for each exposure status, with the count [STEP 2].
- Finally, I use LEFT JOIN to link procedure/exposure numerators with their appropriate denominators and perform the proportion calculation [STEP 3].

**Please refer to the SAS code I've pasted at the bottom of this post.**

This code generates Table 2 (see below). **How do obtain the standard deviations for these proportion estimates (see ? in the table below)? ** I don’t know how to obtain the standard deviation of a count that I’ve collapsed in SQL. I tried using aggregate functions such as std(count(*)) AS StdDev, but SAS didn’t like that. Do I have to obtain the standard deviation during the count step and then carry it forward or can I somehow obtain it during my proportion step? I know how to calculate the standard deviation if I transform my dataset such that each procedure is a variable instead of having a procedure variable that aligns with the counts. The problem is that my real problem involves hundreds of procedures so I can't switch them all over manually.

**TABLE 2. DATA_SET_B **

Procedure | Exposed | Count | Denominator | Proportion | - Std. Dev.
| |

1 | 1 | 4 | 8 | 4/8 | - 0.50
| ? |

1 | 0 | 4 | 12 | 4/12 | - 0.33
| ? |

2 | 1 | 5 | 8 | 5/8 | - 0.63
| ? |

2 | 0 | 4 | 12 | 4/12 | - 0.33
| ? |

3 | 1 | 1 | 8 | 1/8 | - 0.13
| ? |

3 | 0 | 4 | 12 | 4/12 | - 0.33
| ? |

Thanks for your help. Please be blunt if I’m missing something obvious.

**CODE BELOW:**

*CALCULATING THE PROPORTION OF PATIENTS THAT WERE CODED FOR EACH PROCEDURE AT LEAST ONCE IN THE FOLLOW-UP PERIOD*;

**STEP 1**

*COUNTS FOR DENOMENATORS*;

*COUNT UNIQUE PATIENTS*;

**PROC** **SQL**;

CREATE TABLE work.Denominators AS

SELECT DISTINCT exposed, count(*) AS Denominator

FROM (SELECT DISTINCT exposed, subject FROM work.SourcePopn)

GROUP BY exposed;

**QUIT**;

**STEP 2**

*COUNT UP PROCEDURS FOR NUMERATOR*;

**PROC** **SQL**;

CREATE TABLE work.Numerators AS

SELECT DISTINCT exposed, procedure, count(*) AS count

FROM (SELECT DISTINCT subject, procedure, exposed FROM work.DATA_SET_A)

GROUP BY exposed, procedure

ORDER BY exposed DESC, count DESC;

**QUIT**;

**STEP 3**

*THIS CODE ATTACHES THE DENOMENATOR AND DOES PROPORTION CALCULATION*;

**PROC** **SQL**;

CREATE TABLE work.DATA_SET_B AS

SELECT a.procedure, a.exposed, a.count, b.denominator,

(a.count/b.denominator) AS Proportion

FROM work.Numerators AS a LEFT JOIN work.Denominators as b

ON a.exposed=b.exposed

ORDER BY exposed DESC, count DESC;

**RUN**;

Accepted Solutions

Solution

11-21-2013
04:28 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SasUser324

11-21-2013 04:28 PM

The standard deviation of a proportion is sqrt(p*(1-p)/n), where p is the estimated proportion; should be easy enough to compute from PROC SQL or otherwise.

--

Paige Miller

Paige Miller

All Replies

Solution

11-21-2013
04:28 PM

- Mark as New
- Bookmark
- Subscribe
- RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Posted in reply to SasUser324

11-21-2013 04:28 PM

The standard deviation of a proportion is sqrt(p*(1-p)/n), where p is the estimated proportion; should be easy enough to compute from PROC SQL or otherwise.

--

Paige Miller

Paige Miller