Hi everyone,
I am using proc psmatch to do propensity score matching for two treatment levels. The code below includes the data and the matching code (using different techniques), but the problem is that I am still not able to achieve adequate matching for one of the variables (variable 'Level'). You can see from the univariate analysis in the step after proc psmatch, using proc npar1way that the p-value for Level by Treatment is 0.008 indicating significant difference in the variable Level between the two treatment groups.
I wonder if anyone could assist and suggest how to overcome this problem and improve the matching?
Much appreciated.
DATA Have;
input Age Sex $ Size Stage $ Level Marker Treatment;
DATALINES;
82 0 23 A 8 5.4 1
82 0 15 A 8 5.4 1
61 1 42 A 9 6.7 1
62 1 12 B 7 6.9 1
56 1 19 B 6 3.7 1
56 1 11 B 6 3.7 1
56 1 16 B 6 3.7 1
56 1 10 B 6 3.7 1
84 1 33 A 6 1110.5 1
64 1 34 A 9 1454.2 1
71 0 16 A 10 5.9 1
84 0 31 A 7 66 1
66 1 35 A 6 . 1
70 1 30 A . . 1
65 1 19 0 6 4.2 1
62 1 26 A 6 3.7 1
62 1 15 A 6 3.7 1
39 1 20 A 8 78.9 1
59 1 19 0 7 29.3 1
59 1 25 A 7 18.4 1
57 1 18 0 8 6.4 1
54 0 45 A 6 791.8 1
56 0 23 A 13 343.9 1
73 1 21 A 12 5.1 1
68 1 20 A 9 0.8 1
75 1 20 A 9 1.7 1
64 1 14 A 10 7.5 1
73 1 28 A 17 17 0
62 0 14 A 15 5 0
71 0 11 0 9 4.1 0
66 1 18 0 10 2.1 0
77 1 15 A 8 9.3 0
77 1 12 A 8 9.3 0
77 1 12 A 8 9.3 0
77 1 14 A 8 11.5 0
77 1 13 A 8 11.5 0
57 1 31 A 17 9 0
81 1 22 A 13 7 0
41 0 21 A 13 15 0
53 1 19 0 8 212.8 0
58 1 28 A 10 22.1 0
63 1 18 A 11 10.1 0
63 1 17 A 11 10.1 0
63 1 10 A 11 14.2 0
63 1 13 A 11 14.2 0
56 1 15 0 7 4 0
54 1 15 A 8 7.2 0
54 1 15 A 8 7.2 0
72 1 21 A 6 520.4 0
63 0 17 0 6 . 0
55 1 22 A 11 3.2 0
56 1 13 0 8 2.5 0
54 1 12 A 10 3.2 0
54 1 20 A 10 3.2 0
60 1 27 A 10 53.2 0
60 1 17 0 10 23.4 0
74 1 16 0 16 6.5 0
60 1 16 A . . 0
59 1 27 B 7 8.5 0
59 1 17 B 7 8.5 0
59 1 31 B 7 8.5 0
59 1 22 B 7 8.5 0
59 1 24 B 7 8.5 0
70 1 15 0 11 1 0
68 1 16 A 9 1 0
65 1 16 0 5 1.2 0
67 1 17 0 10 4 0
55 1 10 0 8 49 0
65 1 16 A 12 2.5 0
65 1 11 A 12 2.5 0
65 1 10 B 12 6.8 0
65 1 13 B 12 6.8 0
65 1 22 B 12 6.8 0
65 1 26 B 12 6.8 0
63 1 20 A 12 2.2 0
59 1 24 A 7 58.8 0
69 0 27 A 7 2 0
63 1 45 A 12 5607.7 0
46 1 19 0 19 13 0
59 1 18 0 14 9.9 0
55 1 12 A 12 499.5 0
53 1 13 0 12 18.3 0
60 1 13 0 7 1.9 0
75 1 23 A 7 58.4 0
74 1 12 0 7 19.5 0
58 1 15 0 9 319.7 0
68 1 23 A 10 3.1 0
51 1 20 A 8 7.6 0
51 1 18 A 8 7.6 0
52 1 20 A 7 3.6 0
66 1 18 0 6 5.8 0
69 1 20 A 10 . 0
38 1 16 0 9 10.7 0
66 1 12 B 8 61.3 0
66 1 31 B 8 61.3 0
59 1 12 0 8 18.4 0
65 1 32 B 10 9.8 0
67 1 23 A 10 10.5 0
70 1 15 0 16 5.2 0
71 1 20 A 20 81.3 0
64 0 16 A 11 18.2 0
64 0 17 A 11 18.2 0
64 0 14 A 11 18.2 0
52 0 17 0 17 . 0
60 1 17 0 7 23.8 0
59 1 16 A 10 40.6 0
59 1 12 A 10 40.6 0
58 1 26 A 11 91.1 0
50 1 34 B 11 2292.6 0
50 1 14 B 11 2292.6 0
50 1 34 B 11 2292.6 0
69 1 20 A 9 3.9 0
69 1 17 A 9 3.9 0
69 1 23 A 9 3.9 0
68 1 12 A 16 2.3 0
68 1 14 A 16 2.3 0
68 1 13 B 16 3.8 0
68 1 12 B 16 3.8 0
68 1 14 B 16 3.8 0
68 1 14 B 16 3.8 0
68 1 26 B 16 5 0
68 1 14 B 16 5 0
68 1 26 B 16 5 0
68 1 14 B 16 5 0
55 1 13 0 8 4.6 0
49 1 14 0 17 38.5 0
59 1 31 A 11 24.9 0
65 1 20 A 7 142.1 0
66 1 16 A 20 6.3 0
66 1 15 A 20 6.3 0
62 0 20 A 17 651.7 0
62 1 30 A 8 540.8 0
62 1 16 A 8 540.8 0
61 1 33 B 8 4.1 0
61 1 13 B 8 4.1 0
66 1 22 A 10 8 0
66 1 14 A 10 8 0
53 0 20 A 14 28 0
61 0 22 A 10 41.7 0
49 1 18 0 19 5 0
85 0 12 A 10 20 0
85 0 20 A 10 20 0
85 0 18 A 10 20 0
69 0 14 0 15 6.8 0
68 0 12 0 10 27 0
67 0 15 0 10 19.5 0
76 1 26 A 9 21.8 0
64 1 27 A 6 6 0
85 0 17 0 9 0.9 0
61 1 20 A 13 924.6 0
57 1 24 A 11 84.4 0
56 1 12 0 9 29.2 0
57 1 21 A 14 10.6 0
57 1 22 A 14 10.6 0
45 1 20 A 15 5.8 0
66 0 15 0 8 5.2 0
58 1 13 0 13 11.3 0
56 1 16 0 11 1.2 0
80 0 18 0 7 13 0
80 0 25 A 6 . 0
60 1 21 A 10 3.1 0
60 1 17 0 11 3 0
53 1 22 A 13 10.6 0
68 0 17 0 11 2 0
56 1 25 A 14 87 0
56 1 13 A 14 87 0
81 1 28 A 10 2.7 0
58 0 10 A 9 20.5 0
58 0 14 A 9 20.5 0
66 1 32 B 15 121.8 0
66 1 26 B 15 121.8 0
66 1 12 B 15 121.8 0
56 1 25 A 11 8.7 0
62 1 16 0 6 . 0
73 0 17 0 11 . 0
74 0 10 0 12 5.8 0
76 1 38 A 8 8.7 0
53 1 35 A 8 8720 0
60 1 26 A 21 9.6 0
68 1 19 A 7 3.8 0
68 1 17 A 7 3.8 0
81 1 20 A 6 1.5 0
81 1 14 A 6 1.5 0
;
run;
ods graphics on;
proc psmatch data=have;
class Stage Sex Treatment ;
psmodel Treatment(Treated='1') = Level Sex Age Stage Size Marker ;
match method=greedy(k=4) distance=lps caliper=0.20;
*match method=optimal(k=1) stat=lps caliper=0.20;
*match stat=ps method=varratio(kmin=1 kmax=10) caliper=0.4;
assess lps var=( Age Level Size Marker ) /plots=(CDFPlot BoxPlot StdDiff);
output out(obs=all)= Matched matchid=MID;
run;
ods graphics off;
proc npar1way data=Matched wilcoxon;
var Level Age Size Marker;
class Treatment;
where mid ne .;
run;
proc freq data= matched;
tables treatment*(sex stage)/norow nocol chisq;
where mid ne .;
run;
When you run your code do you get a warning like this in the log?
WARNING: Some treated units have less than the specified K=4 matched controls because there are not enough available controls for these treated units.
If the example data set you show is all of your data then sample size may be an issue because your combination of values for the PSMODEL statement independent variables almost uniquely identifies records.
Thanks @ballardw
This was all the dataset I had.
Could I ask how you would approach such an issue? If you cannot match using propensity score what are the alternatives?
Is performing the analysis with inverse probability of treatment weighting an alternative?
Thanks.
@ammarhm wrote:
Thanks @ballardw
This was all the dataset I had.
Could I ask how you would approach such an issue? If you cannot match using propensity score what are the alternatives?
Is performing the analysis with inverse probability of treatment weighting an alternative?
Thanks.
I might try reducing the number of variables on the PSMODEL statement. I don't know what any of those mean other than moderately confident in Sex and Age. You might get slightly better results by grouping ages, such as in 5 or 10-year groups unless you have other information about how age affects this process.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
ANOVA, or Analysis Of Variance, is used to compare the averages or means of two or more populations to better understand how they differ. Watch this tutorial for more.
Find more tutorials on the SAS Users YouTube channel.