Hi,
I have two different cohorts (0 and 1), in which participants could receive treatment A or B. I want to compare baseline characteristics between those receiving treatment A vs. B in a merged dataset (data=have) containing individuals from cohort 0 and 1.
To account for the differences introduced by the differences between those receiving treatment A vs. B in cohort o vs. 1, I was thinking about weighting by the probability of being in cohort 1 when comparing treatment A vs. B in the merged dataset (see below). But is there a smarter approach? Maybe random sampling from the merged dataset? Inputs/Comments are highly valued. Thanks!
proc logistic data=have;
class cohort sex;
model cohort(event='1') = age sex / stb;
output out=want prob=prob;
run;
proc means data=want median q1 q3;
class treatment;
var age;
weight prob;
run;
For comparing characteristics of different treatment conditions you might consider using the ASSESS statement in PROC PSMATCH. You can use PROC PSMATCH to produce graphical summaries for assessing balance after stratifying on the predicted probability of receiving treatment, inverse probability weighting (IPW), or matching. For example the code below would produce balance assessments that incorporate the IPW-ATT weights
proc psmatch data=have region=allobs;
class cohort sex;
psmodel cohort(treated='1')= age sex;
assess ps var=(age sex) / plots=all;
output out=want weight=attwgt;
run;
For more information about inverse probability weighting in PROC PSMATCH you can look at the Propensity Score Weighting section or Example 1 in the PROC PSMATCH documentation. Note that the weight= option use in the example code I provided and the PSWEIGHT statement used in the documentation example are new syntax introduced in SAS/STAT 15.1. The documentation for previous releases is also available on online here
http://support.sas.com/documentation/onlinedoc/stat/index.html
I don't think LOGISTIC is necessary or appropriate here. LOGISTIC is for cases when your response variable(s) are binary, you don't have that here.
I think PROC SURVEYMEANS will work better, it can compare means if the samples are somehow weighted by probability that an individual is a cohort. Of course, this assumes I know what you mean by "I was thinking about weighting by the probability of being in cohort 1" and I don't really know, you haven't really explained how the design of the study produces individuals in cohort 0 or cohort 1. So please explain further the design of the study.
I can see my wording was equivocal - sorry about that. The event in the logistic model is cohort (0 vs. 1 - binary) with explanatory variables a-z, which calculates the probability of being in cohort 1 (event) given explanatory variables a-z.
In other words, when I weight by this probability in the baseline comparisons of treatment A vs. B in the merged dataset, the p-values are adjusted (weighted) by the probability of being in cohort 1 based on the differences in explanatory variables relative to cohort 0.
My problem is:
1. The frequency (n) of patients receiving treatment A is greater in cohort 1 vs. 0.
2. Cohort 1 is "healthier" than 0.
This introduces pseudo-differences between treatment A and B as function of the relative "oversampling" in cohort 1. My question is therefore, is this the above modelling the best way account for differences in treatment A vs. B in a merged dataset of cohort 0 and 1, which differ on baseline variables.
Maybe a random sampling (with a by statement on cohort?) from the merged dataset would be better? Then the number of patients receiving A vs. B would then not be biased on the oversampling of treatment A in cohort 1.
It seems that you want some sort of statistical analyses where some variables are considered independent variables (or predictor variables, or x-variables) and other variables are considered dependent variables (or response variables, or y-variables).
Now I still am confused about which are the x-variables and which are the y-variables. It seems to me that cohort and A vs B are x-variables, and some measure of health in a y-variable here. Could you comment on this?
What you describe sounds like something that requires causal analysis - probably the method that is implemented in PROC CAUSALTRT. See the discussion and examples in the documentation for that procedure.
For comparing characteristics of different treatment conditions you might consider using the ASSESS statement in PROC PSMATCH. You can use PROC PSMATCH to produce graphical summaries for assessing balance after stratifying on the predicted probability of receiving treatment, inverse probability weighting (IPW), or matching. For example the code below would produce balance assessments that incorporate the IPW-ATT weights
proc psmatch data=have region=allobs;
class cohort sex;
psmodel cohort(treated='1')= age sex;
assess ps var=(age sex) / plots=all;
output out=want weight=attwgt;
run;
For more information about inverse probability weighting in PROC PSMATCH you can look at the Propensity Score Weighting section or Example 1 in the PROC PSMATCH documentation. Note that the weight= option use in the example code I provided and the PSWEIGHT statement used in the documentation example are new syntax introduced in SAS/STAT 15.1. The documentation for previous releases is also available on online here
http://support.sas.com/documentation/onlinedoc/stat/index.html
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.