Hello all,
I can’t seem to wrap my mind around this and was wondering if anyone could help clarify things. How do I calculate the standard deviation of calculations of SQL count data that was collapsed across observations and then how do I carry that through the calculations I’m executing in SQL? This is hard to explain so I made an illustrative example and tried to describe clearly the calculation I’m doing and the standard deviation I’m looking for. I have a hunch I’m overlooking some fundamental concept here.
Let’s assume I have a cohort of 20 people (8 exposed, 12 unexposed) and I am interested in determining the proportion of people that had each of three procedures, stratified by exposure status. I can only see claims for 13 people here, 7 had no procedure and so don’t appear in this data table. My eventual goal is to generate the usual Table 1 comparing exposed to unexposed and including a standardized difference (of procedure rates). Unfortunately, I can't accomplish the first step of generating standard deviation estimates.
My data is in the following format:
TABLE 1. DATA_SET_A
Procedure | Exposed | Subject |
1 | 1 | Person 1 |
1 | 1 | Person 2 |
1 | 1 | Person 3 |
1 | 1 | Person 4 |
1 | 0 | Person 5 |
1 | 0 | Person 6 |
1 | 0 | Person 7 |
1 | 0 | Person 8 |
2 | 1 | Person 1 |
2 | 1 | Person 3 |
2 | 0 | Person 6 |
2 | 0 | Person 8 |
2 | 1 | Person 9 |
2 | 1 | Person 10 |
2 | 0 | Person 11 |
2 | 1 | Person 12 |
2 | 0 | Person 13 |
3 | 0 | Person 5 |
3 | 0 | Person 8 |
3 | 1 | Person 10 |
3 | 0 | Person 11 |
3 | 0 | Person 13 |
Please refer to the SAS code I've pasted at the bottom of this post.
This code generates Table 2 (see below). How do obtain the standard deviations for these proportion estimates (see ? in the table below)? I don’t know how to obtain the standard deviation of a count that I’ve collapsed in SQL. I tried using aggregate functions such as std(count(*)) AS StdDev, but SAS didn’t like that. Do I have to obtain the standard deviation during the count step and then carry it forward or can I somehow obtain it during my proportion step? I know how to calculate the standard deviation if I transform my dataset such that each procedure is a variable instead of having a procedure variable that aligns with the counts. The problem is that my real problem involves hundreds of procedures so I can't switch them all over manually.
TABLE 2. DATA_SET_B
Procedure | Exposed | Count | Denominator | Proportion |
| |
1 | 1 | 4 | 8 | 4/8 |
| ? |
1 | 0 | 4 | 12 | 4/12 |
| ? |
2 | 1 | 5 | 8 | 5/8 |
| ? |
2 | 0 | 4 | 12 | 4/12 |
| ? |
3 | 1 | 1 | 8 | 1/8 |
| ? |
3 | 0 | 4 | 12 | 4/12 |
| ? |
Thanks for your help. Please be blunt if I’m missing something obvious.
CODE BELOW:
*CALCULATING THE PROPORTION OF PATIENTS THAT WERE CODED FOR EACH PROCEDURE AT LEAST ONCE IN THE FOLLOW-UP PERIOD*;
STEP 1
*COUNTS FOR DENOMENATORS*;
*COUNT UNIQUE PATIENTS*;
PROC SQL;
CREATE TABLE work.Denominators AS
SELECT DISTINCT exposed, count(*) AS Denominator
FROM (SELECT DISTINCT exposed, subject FROM work.SourcePopn)
GROUP BY exposed;
QUIT;
STEP 2
*COUNT UP PROCEDURS FOR NUMERATOR*;
PROC SQL;
CREATE TABLE work.Numerators AS
SELECT DISTINCT exposed, procedure, count(*) AS count
FROM (SELECT DISTINCT subject, procedure, exposed FROM work.DATA_SET_A)
GROUP BY exposed, procedure
ORDER BY exposed DESC, count DESC;
QUIT;
STEP 3
*THIS CODE ATTACHES THE DENOMENATOR AND DOES PROPORTION CALCULATION*;
PROC SQL;
CREATE TABLE work.DATA_SET_B AS
SELECT a.procedure, a.exposed, a.count, b.denominator,
(a.count/b.denominator) AS Proportion
FROM work.Numerators AS a LEFT JOIN work.Denominators as b
ON a.exposed=b.exposed
ORDER BY exposed DESC, count DESC;
RUN;
The standard deviation of a proportion is sqrt(p*(1-p)/n), where p is the estimated proportion; should be easy enough to compute from PROC SQL or otherwise.
The standard deviation of a proportion is sqrt(p*(1-p)/n), where p is the estimated proportion; should be easy enough to compute from PROC SQL or otherwise.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.