Hi,
I’ve been manually recalculating the Standardized Mean Difference (SMD) for categorical variables, but the values I calculate differ from those provided by PROC PSMATCH
. Below is my approach and code for one specific variable, PCDK46
.
This code generates the propensity score weights (_ATTWgt_
) and outputs the standardized differences table.
ods graphics on;
proc psmatch data=ac_padsl region=allobs;
class PSTUDYID PECOGBL PCDK46 PHER2 PRACE PSTAGE PSTYESYN PSTNOYN PSTUNKYN PRASIAYN PRAFRYN PRWHTYN PROTHYN PRUNKYN;
psmodel PSTUDYID (Treated='Study') = PAGE PECOGBL PCDK46 PLINES PRACE PSTAGE PHER2;
psweight weight=attwgt nlargestwgt=6;
assess lps var=(PAGE PECOGBL PCDK46 PLINES PSTYESYN PSTNOYN PSTUNKYN PRASIAYN PRAFRYN PRWHTYN PROTHYN PRUNKYN PHER2)
/ varinfo plots=(barchart boxplot(display=(lps PAGE)) wgtcloud);
id PAGE PLINES PECOGBL PCDK46 PHER2;
output out(obs=all)=ac_OutEx1 weight=_ATTWgt_;
ods output StdDiff=ac_myStdDiff;
run;
ods graphics off;
PCDK46
/* Treatment group prevalence */
proc freq data=ac_OutEx1 noprint;
where pstudyid = "Study";
tables PCDK46 / nocol norow out=treatment_output (rename=(percent=prevalence_treatment));
weight _ATTWgt_;
run;
/* Control group prevalence */
proc freq data=ac_OutEx1 noprint;
where pstudyid = "Flatiron";
tables PCDK46 / nocol norow out=control_output (rename=(percent=prevalence_control));
weight _ATTWgt_;
run;
Merge Treatment and Control Data to Calculate SMD
data combined_output;
merge treatment_output (keep=PCDK46 prevalence_treatment)
control_output (keep=PCDK46 prevalence_control);
by PCDK46;
run;
/* Calculate SMD */
data smd_result_PCDK46;
set combined_output end=last;
/* Convert prevalence to proportions */
if PCDK46 = "Y" then do;
p_treatment = prevalence_treatment / 100;
p_control = prevalence_control / 100;
/* SMD Formula */
smd = (p_treatment - p_control) / sqrt(
((p_treatment * (1 - p_treatment)) + (p_control * (1 - p_control))) / 2
);
end;
variable = "PCDK46";
keep variable smd;
if last;
run;
From this i get 0.063704264 for PCDK46 but in PROC PSMATCH i get 0.05313. Can anyone provide any insights?
Yeah I tried that but no luck and this is what nlargestwgt does. removing it has no impact. It seems like i need a corrected formula for the variance of the weighted probabilities
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.