BookmarkSubscribeRSS Feed
smackerz1988
Pyrite | Level 9

Hi,

 

 I’ve been manually recalculating the Standardized Mean Difference (SMD) for categorical variables, but the values I calculate differ from those provided by PROC PSMATCH. Below is my approach and code for one specific variable, PCDK46.

 

Step 1: My PROC PSMATCH Code

This code generates the propensity score weights (_ATTWgt_) and outputs the standardized differences table.

ods graphics on;
proc psmatch data=ac_padsl region=allobs; 
   class PSTUDYID PECOGBL PCDK46 PHER2 PRACE PSTAGE PSTYESYN PSTNOYN PSTUNKYN PRASIAYN PRAFRYN PRWHTYN PROTHYN PRUNKYN;  
   psmodel PSTUDYID (Treated='Study') = PAGE PECOGBL PCDK46 PLINES PRACE PSTAGE PHER2;
   psweight weight=attwgt nlargestwgt=6; 
   assess lps var=(PAGE PECOGBL PCDK46 PLINES PSTYESYN PSTNOYN PSTUNKYN PRASIAYN PRAFRYN PRWHTYN PROTHYN PRUNKYN PHER2) 
          / varinfo plots=(barchart boxplot(display=(lps PAGE)) wgtcloud);  
   id PAGE PLINES PECOGBL PCDK46 PHER2;  
   output out(obs=all)=ac_OutEx1 weight=_ATTWgt_;
   ods output StdDiff=ac_myStdDiff; 
run;
ods graphics off;

Step 2: Manual Calculation of Weighted Prevalence and SMD

Weighted Prevalence for PCDK46

 

/* Treatment group prevalence */
proc freq data=ac_OutEx1 noprint;
    where pstudyid = "Study";
    tables PCDK46 / nocol norow out=treatment_output (rename=(percent=prevalence_treatment));
    weight _ATTWgt_;
run;

/* Control group prevalence */
proc freq data=ac_OutEx1 noprint;
    where pstudyid = "Flatiron";
    tables PCDK46 / nocol norow out=control_output (rename=(percent=prevalence_control));
    weight _ATTWgt_;
run;

Merge Treatment and Control Data to Calculate SMD

 

data combined_output;
    merge treatment_output (keep=PCDK46 prevalence_treatment)
          control_output (keep=PCDK46 prevalence_control);
    by PCDK46;
run;

/* Calculate SMD */
data smd_result_PCDK46;
    set combined_output end=last;

    /* Convert prevalence to proportions */
    if PCDK46 = "Y" then do;
        p_treatment = prevalence_treatment / 100;
        p_control = prevalence_control / 100;

        /* SMD Formula */
        smd = (p_treatment - p_control) / sqrt(
                ((p_treatment * (1 - p_treatment)) + (p_control * (1 - p_control))) / 2
             );
        
    end;

    variable = "PCDK46";
    keep variable smd;
    if last;
run;

From this i get  0.063704264 for PCDK46 but in PROC PSMATCH i get 0.05313.  Can anyone provide any insights?

 

 

4 REPLIES 4
quickbluefish
Barite | Level 11
The variable PCDK46 has exactly 2 levels and no missing values?
quickbluefish
Barite | Level 11
Only thing I can think is to try the PSMATCH with PCDK46 as the only explanatory variable in the PSMODEL statement and seeing if the associated SMD (from the StdDiff table output) is the same as what you're currently getting. I also don't know offhand what the 'nlargestwgt' argument is doing and whether that's affecting things.
smackerz1988
Pyrite | Level 9

Yeah I tried that but no luck and this is what nlargestwgt does. removing it has no impact. It seems like  i need a corrected formula for the variance of the weighted probabilities

 

smackerz1988_0-1734521488943.png

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1206 views
  • 1 like
  • 2 in conversation