- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have aggregate data (no person-level data; all numbers and percentages) including predictor variables with multiple levels (age in the screenshot below has 5 levels) across 3 levels of an outcome (n% across the top of the screen shot below). My mentor is asking me to calculate standard differences for each categorical predictor variable across 3 levels of the outcome (realizing that we'll have to compare 2 vs 1, and 3 vs 1; instead of 1 vs 2 vs 3). I see that to use proc psmatch, the predictor variables have to be binary (0/1), but I don't think I can do this with the aggregated data that I have.
Does anyone know how I can calculate standard differences for categorical predictor variables as shown below?
Thank you in advance for your help!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please post a link to your definition of "standard difference".
My searches turn up too many radically different things containing that phrase to want to spend any time guessing which is applicable.
If you are unable to post data step code describing your data then please at least post simple text in a text box opened on the forum with the </> icon that appears above the message window.
And indicate which are "outcomes". Four identical column headings obfuscates which are what for which purpose.
BTW there are procedures that do tests across multiple levels but the data needs to be in a reasonable form for specific tests.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm guessing that the OP knows about and has read Yang and Dalton ("A unified approach to measuring the effect size between two groups using SAS", SAS... which shows how to calculate "standardized difference scores" for two groups. The definition is on pp 2-3. Their macro is available from the Cleveland Clinic at https://www.lerner.ccf.org/quantitative-health/documents/stddiff.sas
It sounds like the OP wants to compute a similar measure for more than two groups.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I have read that article and have been trying to use the macro without success until this morning. I think I got it to work with my data (count data, see below). I knew that I could only compare outcomes 2 vs 1, and 3 vs 1, but was trying to figure out how to calculate a single standardized difference for all of age (5 levels). I realized this morning that I had to calculate a SMD for each level of age (as binary 0/1). I'm waiting on feedback from my mentor.
Here is the data:
Data age;
Input level age_grp severity count;
cards;
0 0 0 34967
0 1 0 109368
0 0 1 674
0 1 1 5992
0 0 2 133
0 1 2 4790
1 0 0 44727
1 1 0 99608
1 0 1 1293
1 1 1 5373
1 0 2 1342
1 1 2 3581
2 0 0 29442
2 1 0 114893
2 0 1 1074
2 1 1 5592
2 0 2 2035
2 1 2 2888
3 0 0 30257
3 1 0 114078
3 0 1 3077
3 1 1 3589
3 0 2 1314
3 1 2 3609
4 0 0 4942
4 1 0 139393
4 0 1 548
4 1 1 6118
4 0 2 99
4 1 2 4824
;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I am working on doing the same and need help. How did you end up resolving this issue?
The Macro, as stated before, works for two groups only. Are there any other macros we can use to calculate for multiple groups or any base SAS codes we can code to calculate the SMD? Can someone help answer this?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
To find the standard difference, you can use a formula. For example, there are two buckets “1” and “2” with some stuff. What is standard difference between stuff in bucket “2” vs “1”? To standardize relative to bucket #1, we need to divide by the amount of stuff in this bucket. So, the standard difference is the amount in bucket “2” minus “1” divided by “1”. For example, the standard difference in the amount of 🥔 🥔 (potatoes) between buckets # 2 and # 1 is 0.2, meaning that bucket 2 has 20% more potatoes relative to bucket #1:
Std_diff_2_to_1 = (b2 - b1)/b1;
We can multiply by 100 to get the percent.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
The macro referenced above is flawed in the way it calculates standard diffs for variables with more than two levels -- this is specifically because of the way it stores an array of floating point numbers in a single macro variable string. The result is that things get rounded pretty substantially, and in certain cases, this can affect the SMD quite a lot. I've had an email exchange with the authors of that paper and they are aware of the problem. At some point, I re-wrote the macro in such a way that it avoids those two steps ('select ... into :var separated by...' syntax) and the results of that macro exactly match one written in a completely different way with PROC IML. The process of calculating these SMDs is much more involved than doing it for variables with only two levels.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Would you be willing to share your corrected macro?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You're welcome to try using this 'table1' macro, which, among other things, will calculate SMDs for continuous, 2 level and >2 level categorical variables. If you describe a bit what your data look like and what the result is you're trying to achieve, I can help you set up the call to the macro correctly. It's the program called 'table1.sas' in this github repo:
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you so much for sharing! I tried the macro but encountered some errors, and the SD was not being calculated for one of my variables, so it turned up as missing. Regarding my data structure, here is an example where I would like to get the SD of the continuous variables and the proportions. Thank you so much for all your help so far with this. What am I missing? Please help me understand why it's not working.
Variable | TRT1 | TRT2 | TRT3 | SD | |||
Age | 72.5 | (5.9) | 73.1 | (6.1) | 72.5 | (5.8) |
|
Gender, n (%) |
|
|
|
|
|
|
|
. Female | 406 | 9.09 | 1256 | 28.12 | 2804 | 62.79 |
|
. Male | 403 | 9.03 | 1183 | 26.50 | 2878 | 64.47 |
|
Race, n (%) |
|
|
|
|
|
|
|
. Asian | 10 | 8.26 | 26 | 21.49 | 85 | 70.25 |
|
. Black | 184 | 8.90 | 580 | 28.06 | 1303 | 63.04 |
|
. Hispanic | 5 | 8.77 | 12 | 21.05 | 40 | 70.18 |
|
. White | 610 | 9.12 | 1821 | 27.24 | 4254 | 63.64 |
|
I got the SMD for everything but race categories. What would be the issue?
var | lvl | ALL | ALL_2 | TRT_1 | TRT_1_2 | TRT_2 | TRT_2_2 | TRT_3 | TRT_3_2 | test_stat | test_stat_val | pval | has_missing | SMD_MNA2_2_vs_MNA2_1 | SMD_MNA2_3_vs_MNA2_1 | SMD_MNA2_3_vs_MNA2_2 |
Pop | 8758 | 1 | 784 | 0.089518 | 2382 | 0.27198 | 5592 | 0.638502 | . | |||||||
Gender | Chi2 | 2.637272 | 0.2675 | -0.03101 | 0.008678663 | 0.039693716 | ||||||||||
Female | 4414 | 0.503996 | 394 | 0.502551 | 1234 | 0.518052 | 2786 | 0.498212 | . | |||||||
Male | 4344 | 0.496004 | 390 | 0.497449 | 1148 | 0.481948 | 2806 | 0.501788 | . | |||||||
Race | Chi2 | 5.701208 | 0.4575 | 0.041915 | 0.023022193 | 0.058672359 | ||||||||||
Asian | 121 | 0.013816 | 10 | 0.012755 | 26 | 0.010915 | 85 | 0.0152 | . | |||||||
Black | 1949 | 0.222539 | 172 | 0.219388 | 558 | 0.234257 | 1219 | 0.21799 | . | |||||||
Hispanic | 57 | 0.006508 | 5 | 0.006378 | 12 | 0.005038 | 40 | 0.007153 | . | |||||||
White | 6631 | 0.757136 | 597 | 0.76148 | 1786 | 0.74979 | 4248 | 0.759657 | . | |||||||
Age | Mean (StdDev) | 72.63896 | 5.883168 | 72.52296 | 5.937476 | 73.08564 | 6.039716 | 72.46495 | 5.798456 | . | 0.093956 | -0.009885058 | -0.104841103 |
Here is the macro call:
/* Call the %table1 macro */
/* Include the macro file */
%INCLUDE 'C:\FOLDER\table 1 macro.txt';
%table1(
personfile=test, /* Input dataset */
stratvars=trt, /* Stratify only by mna2 */
rowvars=
gender | /* Gender as a row variable */
race | /* Race as a row variable */
age/mean std | /* Age as a continuous variable with mean and std */
uselabels=1, /* Use variable labels if available */
pvalues=1, /* Include p-values for group comparisons */
printSMD=1 /* Calculate and display standardized mean differences */
);
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Just want to add that I reran it and yes the macro worked! No missing. Thank you so much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content