Statistical Procedures

Programming the statistical procedures from SAS
BookmarkSubscribeRSS Feed
amrora
Calcite | Level 5

Hello,

 

I have aggregate data (no person-level data; all numbers and percentages) including predictor variables with multiple levels (age in the screenshot below has 5 levels) across 3 levels of an outcome (n% across the top of the screen shot below). My mentor is asking me to calculate standard differences for each categorical predictor variable across 3 levels of the outcome (realizing that we'll have to compare 2 vs 1, and 3 vs 1; instead of 1 vs 2 vs 3). I see that to use proc psmatch, the predictor variables have to be binary (0/1), but I don't think I can do this with the aggregated data that I have.

Does anyone know how I can calculate standard differences for categorical predictor variables as shown below?

amrora_0-1680897394276.png


Thank you in advance for your help!

12 REPLIES 12
ballardw
Super User

Please post a link to  your definition of "standard difference".

 

My searches turn up  too many radically different things containing that phrase to want to spend any time guessing which is applicable.

 

If you are unable to post data step code describing your data then please at least post simple text in a text box opened on the forum with the </> icon that appears above the message window.

And indicate which are "outcomes". Four identical column headings obfuscates which are what for which purpose.

 

BTW there are procedures that do tests across multiple levels but the data needs to be in a reasonable form for specific tests.

Rick_SAS
SAS Super FREQ

I'm guessing that the OP knows about and has read Yang and Dalton ("A unified approach to measuring the effect size between two groups using SAS", SAS... which shows how to calculate "standardized difference scores" for two groups. The definition is on pp 2-3. Their macro is available from the Cleveland Clinic at https://www.lerner.ccf.org/quantitative-health/documents/stddiff.sas

 

It sounds like the OP wants to compute a similar measure for more than two groups. 

 

 

amrora
Calcite | Level 5

Hi, I have read that article and have been trying to use the macro without success until this morning. I think I got it to work with my data (count data, see below). I knew that I could only compare outcomes 2 vs 1, and 3 vs 1, but was trying to figure out how to calculate a single standardized difference for all of age (5 levels). I realized this morning that I had to calculate a SMD for each level of age (as binary 0/1). I'm waiting on feedback from my mentor. 

 

Here is the data:

Data age;
Input level age_grp severity count;
cards;
0 0 0 34967
0 1 0 109368
0 0 1 674
0 1 1 5992
0 0 2 133
0 1 2 4790
1 0 0 44727
1 1 0 99608
1 0 1 1293
1 1 1 5373
1 0 2 1342
1 1 2 3581
2 0 0 29442
2 1 0 114893
2 0 1 1074
2 1 1 5592
2 0 2 2035
2 1 2 2888
3 0 0 30257
3 1 0 114078
3 0 1 3077
3 1 1 3589
3 0 2 1314
3 1 2 3609
4 0 0 4942
4 1 0 139393
4 0 1 548
4 1 1 6118
4 0 2 99
4 1 2 4824
;

amandiyliwo
Fluorite | Level 6

I am working on doing the same and need help. How did you end up resolving this issue?

 

The Macro, as stated before, works for two groups only. Are there any other macros we can use to calculate for multiple groups or any base SAS codes we can code to calculate the SMD? Can someone help answer this?

pink_poodle
Barite | Level 11

To find the standard difference, you can use a formula. For example, there are two buckets “1” and “2” with some stuff. What is standard difference between stuff in bucket “2” vs “1”? To standardize relative to bucket #1, we need to divide by the amount of stuff in this bucket. So, the standard difference is the amount in bucket “2” minus “1” divided by “1”. For example, the standard difference in the amount of 🥔 🥔 (potatoes) between buckets # 2 and # 1 is 0.2, meaning that bucket 2 has 20% more potatoes relative to bucket #1:

Std_diff_2_to_1 = (b2 - b1)/b1;

We can multiply by 100 to get the percent.

quickbluefish
Lapis Lazuli | Level 10

The macro referenced above is flawed in the way it calculates standard diffs for variables with more than two levels -- this is specifically because of the way it stores an array of floating point numbers in a single macro variable string.  The result is that things get rounded pretty substantially, and in certain cases, this can affect the SMD quite a lot.  I've had an email exchange with the authors of that paper and they are aware of the problem.  At some point, I re-wrote the macro in such a way that it avoids those two steps ('select ... into :var separated by...' syntax) and the results of that macro exactly match one written in a completely different way with PROC IML.  The process of calculating these SMDs is much more involved than doing it for variables with only two levels. 

amandiyliwo
Fluorite | Level 6

Would you be willing to share your corrected macro?

quickbluefish
Lapis Lazuli | Level 10

You're welcome to try using this 'table1' macro, which, among other things, will calculate SMDs for continuous, 2 level and >2 level categorical variables.  If you describe a bit what your data look like and what the result is you're trying to achieve, I can help you set up the call to the macro correctly.  It's the program called 'table1.sas' in this github repo:

 

https://github.com/Jeremy-Smith5/CEP-public/tree/main/SAS

quickbluefish
Lapis Lazuli | Level 10
...and if you'd rather avoid that, there's an R package (not written by me) that will calculate SMDs like this. I think I've used that once or twice.
amandiyliwo
Fluorite | Level 6

Thank you so much for sharing! I tried the macro but encountered some errors, and the SD was not being calculated for one of my variables, so it turned up as missing. Regarding my data structure, here is an example where I would like to get the SD of the continuous variables and the proportions. Thank you so much for all your help so far with this. What am I missing? Please help me understand why it's not working. 

 

Variable

TRT1

TRT2

TRT3

SD

Age

72.5

(5.9)

73.1

(6.1)

72.5

(5.8)

 

Gender, n (%)

 

 

 

 

 

 

 

.     Female

406

9.09

1256

28.12

2804

62.79

 

.     Male

403

9.03

1183

26.50

2878

64.47

 

Race, n (%)

 

 

 

 

 

 

 

.     Asian

10

8.26

26

21.49

85

70.25

 

.     Black

184

8.90

580

28.06

1303

63.04

 

.     Hispanic

5

8.77

12

21.05

40

70.18

 

.     White

610

9.12

1821

27.24

4254

63.64

 

 

I got the SMD for  everything but race categories. What would be the issue?

 

varlvlALLALL_2TRT_1TRT_1_2TRT_2TRT_2_2TRT_3TRT_3_2test_stattest_stat_valpvalhas_missingSMD_MNA2_2_vs_MNA2_1SMD_MNA2_3_vs_MNA2_1SMD_MNA2_3_vs_MNA2_2
Pop 875817840.08951823820.2719855920.638502  .    
Gender         Chi22.6372720.2675 -0.031010.0086786630.039693716
 Female44140.5039963940.50255112340.51805227860.498212  .    
 Male43440.4960043900.49744911480.48194828060.501788  .    
Race         Chi25.7012080.4575 0.0419150.0230221930.058672359
 Asian1210.013816100.012755260.010915850.0152  .    
 Black19490.2225391720.2193885580.23425712190.21799  .    
 Hispanic570.00650850.006378120.005038400.007153  .    
 White66310.7571365970.7614817860.7497942480.759657  .    
AgeMean (StdDev)72.638965.88316872.522965.93747673.085646.03971672.464955.798456  . 0.093956-0.009885058-0.104841103

 

Here is the macro call:

 

/* Call the %table1 macro */

/* Include the macro file */
%INCLUDE 'C:\FOLDER\table 1 macro.txt';
%table1(
personfile=test, /* Input dataset */
stratvars=trt, /* Stratify only by mna2 */
rowvars=
gender | /* Gender as a row variable */
race | /* Race as a row variable */
age/mean std | /* Age as a continuous variable with mean and std */
uselabels=1, /* Use variable labels if available */
pvalues=1, /* Include p-values for group comparisons */
printSMD=1 /* Calculate and display standardized mean differences */
);

amandiyliwo
Fluorite | Level 6

Just want to add that I reran it and yes the macro worked! No missing. Thank you so much!

quickbluefish
Lapis Lazuli | Level 10
Great! One thing - if your STRATVAR has more than 2 levels (looks like you have 3), the macro will not calculate p-values (only SMDs). If you really need p-values for each pair-wise comparison of the STRATVAR, you would need to run the macro multiple times (once for each pair of cohorts) and merge them together. Very awkward, but works. But SMDs should work regardless of how many columns - there will be one for each pair.

sas-innovate-white.png

Join us for our biggest event of the year!

Four days of inspiring keynotes, product reveals, hands-on learning opportunities, deep-dive demos, and peer-led breakouts. Don't miss out, May 6-9, in Orlando, Florida.

 

View the full agenda.

Register now!

What is ANOVA?

ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 2500 views
  • 7 likes
  • 6 in conversation