Hello all,
I can’t seem to wrap my mind around this and was wondering if anyone could help clarify things. How do I calculate the standard deviation of calculations of SQL count data that was collapsed across observations and then how do I carry that through the calculations I’m executing in SQL? This is hard to explain so I made an illustrative example and tried to describe clearly the calculation I’m doing and the standard deviation I’m looking for. I have a hunch I’m overlooking some fundamental concept here.
Let’s assume I have a cohort of 20 people (8 exposed, 12 unexposed) and I am interested in determining the proportion of people that had each of three procedures, stratified by exposure status. I can only see claims for 13 people here, 7 had no procedure and so don’t appear in this data table. My eventual goal is to generate the usual Table 1 comparing exposed to unexposed and including a standardized difference (of procedure rates). Unfortunately, I can't accomplish the first step of generating standard deviation estimates.
My data is in the following format:
TABLE 1. DATA_SET_A
Procedure | Exposed | Subject |
1 | 1 | Person 1 |
1 | 1 | Person 2 |
1 | 1 | Person 3 |
1 | 1 | Person 4 |
1 | 0 | Person 5 |
1 | 0 | Person 6 |
1 | 0 | Person 7 |
1 | 0 | Person 8 |
2 | 1 | Person 1 |
2 | 1 | Person 3 |
2 | 0 | Person 6 |
2 | 0 | Person 8 |
2 | 1 | Person 9 |
2 | 1 | Person 10 |
2 | 0 | Person 11 |
2 | 1 | Person 12 |
2 | 0 | Person 13 |
3 | 0 | Person 5 |
3 | 0 | Person 8 |
3 | 1 | Person 10 |
3 | 0 | Person 11 |
3 | 0 | Person 13 |
Please refer to the SAS code I've pasted at the bottom of this post.
This code generates Table 2 (see below). How do obtain the standard deviations for these proportion estimates (see ? in the table below)? I don’t know how to obtain the standard deviation of a count that I’ve collapsed in SQL. I tried using aggregate functions such as std(count(*)) AS StdDev, but SAS didn’t like that. Do I have to obtain the standard deviation during the count step and then carry it forward or can I somehow obtain it during my proportion step? I know how to calculate the standard deviation if I transform my dataset such that each procedure is a variable instead of having a procedure variable that aligns with the counts. The problem is that my real problem involves hundreds of procedures so I can't switch them all over manually.
TABLE 2. DATA_SET_B
Procedure | Exposed | Count | Denominator | Proportion |
| |
1 | 1 | 4 | 8 | 4/8 |
| ? |
1 | 0 | 4 | 12 | 4/12 |
| ? |
2 | 1 | 5 | 8 | 5/8 |
| ? |
2 | 0 | 4 | 12 | 4/12 |
| ? |
3 | 1 | 1 | 8 | 1/8 |
| ? |
3 | 0 | 4 | 12 | 4/12 |
| ? |
Thanks for your help. Please be blunt if I’m missing something obvious.
CODE BELOW:
*CALCULATING THE PROPORTION OF PATIENTS THAT WERE CODED FOR EACH PROCEDURE AT LEAST ONCE IN THE FOLLOW-UP PERIOD*;
STEP 1
*COUNTS FOR DENOMENATORS*;
*COUNT UNIQUE PATIENTS*;
PROC SQL;
CREATE TABLE work.Denominators AS
SELECT DISTINCT exposed, count(*) AS Denominator
FROM (SELECT DISTINCT exposed, subject FROM work.SourcePopn)
GROUP BY exposed;
QUIT;
STEP 2
*COUNT UP PROCEDURS FOR NUMERATOR*;
PROC SQL;
CREATE TABLE work.Numerators AS
SELECT DISTINCT exposed, procedure, count(*) AS count
FROM (SELECT DISTINCT subject, procedure, exposed FROM work.DATA_SET_A)
GROUP BY exposed, procedure
ORDER BY exposed DESC, count DESC;
QUIT;
STEP 3
*THIS CODE ATTACHES THE DENOMENATOR AND DOES PROPORTION CALCULATION*;
PROC SQL;
CREATE TABLE work.DATA_SET_B AS
SELECT a.procedure, a.exposed, a.count, b.denominator,
(a.count/b.denominator) AS Proportion
FROM work.Numerators AS a LEFT JOIN work.Denominators as b
ON a.exposed=b.exposed
ORDER BY exposed DESC, count DESC;
RUN;
The standard deviation of a proportion is sqrt(p*(1-p)/n), where p is the estimated proportion; should be easy enough to compute from PROC SQL or otherwise.
The standard deviation of a proportion is sqrt(p*(1-p)/n), where p is the estimated proportion; should be easy enough to compute from PROC SQL or otherwise.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.