Hello all, I can’t seem to wrap my mind around this and was wondering if anyone could help clarify things. How do I calculate the standard deviation of calculations of SQL count data that was collapsed across observations and then how do I carry that through the calculations I’m executing in SQL? This is hard to explain so I made an illustrative example and tried to describe clearly the calculation I’m doing and the standard deviation I’m looking for. I have a hunch I’m overlooking some fundamental concept here. Let’s assume I have a cohort of 20 people (8 exposed, 12 unexposed) and I am interested in determining the proportion of people that had each of three procedures, stratified by exposure status. I can only see claims for 13 people here, 7 had no procedure and so don’t appear in this data table. My eventual goal is to generate the usual Table 1 comparing exposed to unexposed and including a standardized difference (of procedure rates). Unfortunately, I can't accomplish the first step of generating standard deviation estimates. My data is in the following format: TABLE 1. DATA_SET_A Procedure Exposed Subject 1 1 Person 1 1 1 Person 2 1 1 Person 3 1 1 Person 4 1 0 Person 5 1 0 Person 6 1 0 Person 7 1 0 Person 8 2 1 Person 1 2 1 Person 3 2 0 Person 6 2 0 Person 8 2 1 Person 9 2 1 Person 10 2 0 Person 11 2 1 Person 12 2 0 Person 13 3 0 Person 5 3 0 Person 8 3 1 Person 10 3 0 Person 11 3 0 Person 13 First, I use PROC SQL to pull population counts (stratified by exposure) from my source population, which contains records for all people [STEP 1]. Next, I use PROC SQL to collapse DATA_SET_A so each procedure is listed, for each exposure status, with the count [STEP 2]. Finally, I use LEFT JOIN to link procedure/exposure numerators with their appropriate denominators and perform the proportion calculation [STEP 3]. Please refer to the SAS code I've pasted at the bottom of this post. This code generates Table 2 (see below). How do obtain the standard deviations for these proportion estimates (see ? in the table below)? I don’t know how to obtain the standard deviation of a count that I’ve collapsed in SQL. I tried using aggregate functions such as std(count(*)) AS StdDev, but SAS didn’t like that. Do I have to obtain the standard deviation during the count step and then carry it forward or can I somehow obtain it during my proportion step? I know how to calculate the standard deviation if I transform my dataset such that each procedure is a variable instead of having a procedure variable that aligns with the counts. The problem is that my real problem involves hundreds of procedures so I can't switch them all over manually. TABLE 2. DATA_SET_B Procedure Exposed Count Denominator Proportion Std. Dev. 1 1 4 8 4/8 0.50 ? 1 0 4 12 4/12 0.33 ? 2 1 5 8 5/8 0.63 ? 2 0 4 12 4/12 0.33 ? 3 1 1 8 1/8 0.13 ? 3 0 4 12 4/12 0.33 ? Thanks for your help. Please be blunt if I’m missing something obvious. CODE BELOW: *CALCULATING THE PROPORTION OF PATIENTS THAT WERE CODED FOR EACH PROCEDURE AT LEAST ONCE IN THE FOLLOW-UP PERIOD*; STEP 1 *COUNTS FOR DENOMENATORS*; *COUNT UNIQUE PATIENTS*; PROC SQL; CREATE TABLE work.Denominators AS SELECT DISTINCT exposed, count(*) AS Denominator FROM (SELECT DISTINCT exposed, subject FROM work.SourcePopn) GROUP BY exposed; QUIT; STEP 2 *COUNT UP PROCEDURS FOR NUMERATOR*; PROC SQL; CREATE TABLE work.Numerators AS SELECT DISTINCT exposed, procedure, count(*) AS count FROM (SELECT DISTINCT subject, procedure, exposed FROM work.DATA_SET_A) GROUP BY exposed, procedure ORDER BY exposed DESC, count DESC; QUIT; STEP 3 *THIS CODE ATTACHES THE DENOMENATOR AND DOES PROPORTION CALCULATION*; PROC SQL; CREATE TABLE work.DATA_SET_B AS SELECT a.procedure, a.exposed, a.count, b.denominator, (a.count/b.denominator) AS Proportion FROM work.Numerators AS a LEFT JOIN work.Denominators as b ON a.exposed=b.exposed ORDER BY exposed DESC, count DESC; RUN;
... View more