Hello! I`m trying to perform propencity score analysis using gmatch macro by Mayo Clinic (http://bioinformaticstools.mayo.edu/research/gmatch/). And there`re some errors I can`t imaging how to manage et all. I`ve made "proc logistic" step ro create propensity scores and it is working well. /* Create Propensity Scores */
proc logistic data= NB.DATA;
class INSS Sex ;
model TreoMel = INSS
Sex /
link=glogit rsquare;
output out = out_ps pred = ps xbeta=logit_ps;
/* Output the propensity score and the logit of the propensity score */
run; But there are some problems with further steps with the macro itself. Particularly, there 4 errors as follows: File WORK.NB.DATA does not exist. ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was:&NCA*1 ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was:&NCO < %EVAL(&NCA*&NCONTLS) ERROR: The macro GREEDY will stop executing. Here is the code /*************************************************************************/
/* Compute standard deviation of the logit of the propensity score */
/*************************************************************************/
proc means std data=out_ps;
var logit_ps;
output out=stddata (keep = std) std=std;
run;
data stddata;
set stddata;
std = 0.2*std;
/* calipers of width 0.2 standard deviations of the logit of PS. */
run;
/* Create macro variable that contains the width of the caliper for matching */
data _null_;
set stddata;
call symput('stdcal',std);
run;
/* Match subjects on the logit of the propensity score. */
proc sort data=out_ps;
by TreoMel;
run;
data out_ps;
set out_ps;
id=_N_;
run;
%include '/folders/myfolders/gmatch.sas';
/* The macro %gmatch.sas uses the following parameters:
Data: the name of the SAS data set containing the treated and untreated subjects.
Group: the variable identifying treated/untreated subjects.
Id: the variable denoting subjects’ identification numbers.
Mvars: the list of variables on which one is matching.
Wts: the list of non-negative weights corresponding to each matching variable.
Dist: the type of distance to calculate [1 indicates weighted sum (over matching
variables) of absolute case-control differences].
Dmaxk: the maximum allowable difference in the matching difference between matched
treated and untreated subjects.
Ncontls: the number of untreated subjects to be matched to each treated subject.
Seedca: the random number seed for sorting the treated subjects prior to matching.
Seedco: the random number seed for sorting the untreated subjects prior to
matching.
Out: the name of a SAS data set containing the matched sample.
Print: the flag indicating whether the matched data should be printed. */
%gmatch(
data = NB,
group = TreoMel,
id = id,
mvars = logit_ps,
wts = 1,
dist = 1,
dmaxk = &stdcal,
ncontls = 1,
seedca = 25102007,
seedco = 26102007,
out = matchpairs,
print = F
);
data matchpairs;
set matchpairs;
pair_id = _N_;
run;
/* Create a data set containing the matched BMS patients (untreated subjects) */
data control_match;
set matchpairs;
control_id = __IDCO;
logit_ps = __CO1;
keep pair_id control_id logit_ps;
run;
/* Create a data set containing the matched TreoMel patients (treated subjects) */
data case_match;
set matchpairs;
case_id = __IDCA;
logit_ps = __CA1;
keep pair_id case_id logit_ps;
run;
proc sort data=control_match;
by control_id;
run;
proc sort data=case_match;
by case_id;
run;
data exposed;
set out_ps;
if TreoMel = 1;
case_id = id;
run;
data control;
set out_ps;
if TreoMel = 0;
control_id = id;
run;
proc sort data=exposed;
by case_id;
run;
proc sort data=control;
by control_id;
run;
data control_match;
merge control_match (in=f1) control (in=f2);
by control_id;
if f1 and f2;
run;
data case_match;
merge case_match (in=f1) exposed (in=f2);
by case_id;
if f1 and f2;
run;
data long;
set control_match case_match;
prop_score = exp(logit_ps) / (exp(logit_ps) + 1);
run;
/*
data wide_TreoMel;
set case_match;
death_1_yr_TreoMel = death_1_yr;
tvra_time_TreoMel = tvra_time;
tvra_TreoMel = tvra;
run;
data wide_bms;
set control_match;
death_1_yr_bms = death_1_yr;
tvra_time_bms = tvra_time;
tvra_bms = tvra;
run;
proc sort data=wide_TreoMel;
by pair_id;
run;
proc sort data=wide_bms;
by pair_id;
run;
/* Data set containing outcomes for the matched subjects. */
/* Each row contains outcomes for the treated and untreated subjects */
/* in the matched pair. */
/*data wide_combo;
merge wide_TreoMel (in=f1) wide_bms (in=f2);
by pair_id;
if f1 and f2;
run;
*/
/******************************************************************************/
/* Compute standardized differences for each covariate in the matched sample. */
/******************************************************************************/
/*proc sort data=long;
by TreoMel;
run;*/
/******************************************************************************/
/* Macro for computing standardized differences for continuous variables. */
/******************************************************************************/
%macro cont(var=,label=);
proc means mean stddev data=long noprint;
var &var;
by TreoMel;
output out=outmean (keep = TreoMel mean stddev) mean = mean stddev=stddev;
run;
data TreoMel0;
set outmean;
if TreoMel = 0;
mean_0 = mean;
s_0 = stddev;
keep mean_0 s_0;
run;
data TreoMel1;
set outmean;
if TreoMel = 1;
mean_1 = mean;
s_1 = stddev;
keep mean_1 s_1;
run;
data newdata;
length label $ 25;
merge TreoMel0 TreoMel1;
d = round(abs(d),0.001);
label = &label;
keep d label;
run;
proc append data=newdata base=standiff force;
run;
%mend cont;
/******************************************************************************/
/* Macro for computing standardized differences for binary variables. */
/******************************************************************************/
%macro binary(var=,label=);
proc means mean data=long noprint;
var &var;
by TreoMel;
output out=outmean (keep = TreoMel mean) mean = mean;
run;
data TreoMel0;
set outmean;
if TreoMel = 0;
mean_0 = mean;
keep mean_0;
run;
data TreoMel1;
set outmean;
if TreoMel = 1;
mean_1 = mean;
keep mean_1;
run;
data newdata;
length label $ 25;
merge TreoMel0 TreoMel1;
d = (mean_1 - mean_0)/ sqrt((mean_1*(1-mean_1) + mean_0*(1-mean_0))/2);
d = round(abs(d),0.001);
label = &label;
keep d label;
run;
proc append data=newdata base=standiff force;
run;
%mend binary;
%cont(var=sex,label="sex");
proc print data=standiff;
title 'Standardized differences in propensity score matched sample';
run; The LOG 1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
72
73 OPTIONS VALIDVARNAME=V7;
74 LIBNAME NB XLSX "/folders/myfolders/НБ ауто ДОГиТ+РДКБ.xlsx";
NOTE: Libref NB was successfully assigned as follows:
Engine: XLSX
Physical Name: /folders/myfolders/НБ ауто ДОГиТ+РДКБ.xlsx
75 run;
76
77
78
79 /* Create Propensity Scores */
80 proc logistic data= NB.DATA;
/*some variable name changes are omitted*/
81 class INSS Sex ;
82 model TreoMel = INSS
83 Sex /
84 link=glogit rsquare;
85 output out = out_ps pred = ps xbeta=logit_ps;
86 /* Output the propensity score and the logit of the propensity score */
87 run;
NOTE: The import data set has 1007 observations and 91 variables.
NOTE: The import data set has 1007 observations and 91 variables.
NOTE: PROC LOGISTIC is fitting the generalized logit model. The logits modeled contrast each response level against the reference
level (TreoMel='1'). Use the response variable option REF= if you want to change the reference level.
WARNING: There is possibly a quasi-complete separation of data points. The maximum likelihood estimate may not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last maximum likelihood
iteration. Validity of the model fit is questionable.
NOTE: The import data set has 1007 observations and 91 variables.
NOTE: The import data set has 1007 observations and 91 variables.
NOTE: There were 1007 observations read from the data set NB.DATA.
NOTE: The data set WORK.OUT_PS has 1007 observations and 94 variables.
NOTE: PROCEDURE LOGISTIC used (Total process time):
real time 2.09 seconds
cpu time 2.01 seconds
88
89
90 /*************************************************************************/
91 /* Compute standard deviation of the logit of the propensity score */
92 /*************************************************************************/
93 proc means std data=out_ps;
94 var logit_ps;
95 output out=stddata (keep = std) std=std;
96 run;
NOTE: There were 1007 observations read from the data set WORK.OUT_PS.
NOTE: The data set WORK.STDDATA has 1 observations and 1 variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.07 seconds
cpu time 0.06 seconds
97 data stddata;
98 set stddata;
99 std = 0.2*std;
100 /* calipers of width 0.2 standard deviations of the logit of PS. */
101 run;
NOTE: There were 1 observations read from the data set WORK.STDDATA.
NOTE: The data set WORK.STDDATA has 1 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
102 /* Create macro variable that contains the width of the caliper for matching */
103 data _null_;
104 set stddata;
105 call symput('stdcal',std);
106 run;
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
105:22
NOTE: There were 1 observations read from the data set WORK.STDDATA.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
107 /* Match subjects on the logit of the propensity score. */
108 proc sort data=out_ps;
109 by TreoMel;
110 run;
NOTE: There were 1007 observations read from the data set WORK.OUT_PS.
NOTE: The data set WORK.OUT_PS has 1007 observations and 94 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.03 seconds
cpu time 0.04 seconds
111 data out_ps;
112 set out_ps;
113 id=_N_;
114 run;
NOTE: There were 1007 observations read from the data set WORK.OUT_PS.
NOTE: The data set WORK.OUT_PS has 1007 observations and 95 variables.
NOTE: DATA statement used (Total process time):
real time 0.04 seconds
cpu time 0.05 seconds
115
116 %include '/folders/myfolders/gmatch.sas';
804 /* The macro %gmatch.sas uses the following parameters:
805 Data: the name of the SAS data set containing the treated and untreated subjects.
806 Group: the variable identifying treated/untreated subjects.
807 Id: the variable denoting subjects’ identification numbers.
808 Mvars: the list of variables on which one is matching.
809 Wts: the list of non-negative weights corresponding to each matching variable.
810 Dist: the type of distance to calculate [1 indicates weighted sum (over matching
811 variables) of absolute case-control differences].
812 Dmaxk: the maximum allowable difference in the matching difference between matched
813 treated and untreated subjects.
814 Ncontls: the number of untreated subjects to be matched to each treated subject.
815 Seedca: the random number seed for sorting the treated subjects prior to matching.
816 Seedco: the random number seed for sorting the untreated subjects prior to
817 matching.
818 Out: the name of a SAS data set containing the matched sample.
819 Print: the flag indicating whether the matched data should be printed. */
820 %gmatch(
821 data = NB,
822 group = TreoMel,
823 id = id,
824 mvars = logit_ps,
825 wts = 1,
826 dist = 1,
827 dmaxk = &stdcal,
828 ncontls = 1,
829 seedca = 25102007,
830 seedco = 26102007,
831 out = matchpairs,
832 print = F
833 );
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
ERROR: File WORK.NB.DATA does not exist.
NOTE: Character values have been converted to numeric values at the places given by: (Line):(Column).
833:74
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set WORK.__CHECK may be incomplete. When this step was stopped there were 0 observations and 3 variables.
WARNING: Data set WORK.__CHECK was not replaced because this step was stopped.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
NOTE: There were 0 observations read from the data set WORK.__CHECK.
NOTE: The data set WORK._CACO has 0 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: Variable TreoMel is uninitialized.
NOTE: There were 0 observations read from the data set WORK._CACO.
NOTE: The data set WORK.__CASE has 0 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
833:43
NOTE: There were 0 observations read from the data set WORK.__CASE.
NOTE: The data set WORK.__CASE has 0 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
NOTE: Input data set is empty.
NOTE: The data set WORK.__CASE has 0 observations and 4 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.03 seconds
NOTE: Variable TreoMel is uninitialized.
NOTE: There were 0 observations read from the data set WORK._CACO.
NOTE: The data set WORK.__CONT has 0 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column).
833:43
NOTE: There were 0 observations read from the data set WORK.__CONT.
NOTE: The data set WORK.__CONT has 0 observations and 4 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
WARNING: Apparent symbolic reference NCO not resolved.
WARNING: Apparent symbolic reference NCA not resolved.
WARNING: Apparent symbolic reference NCA not resolved.
ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was:
&NCA*1
ERROR: A character operand was found in the %EVAL function or %IF condition where a numeric operand is required. The condition was:
&NCO < %EVAL(&NCA*&NCONTLS)
ERROR: The macro GREEDY will stop executing.
834 data matchpairs;
835 set matchpairs;
836 pair_id = _N_;
837 run;
NOTE: There were 0 observations read from the data set WORK.MATCHPAIRS.
NOTE: The data set WORK.MATCHPAIRS has 0 observations and 1 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
838 /* Create a data set containing the matched BMS patients (untreated subjects) */
839 data control_match;
840 set matchpairs;
841 control_id = __IDCO;
842 logit_ps = __CO1;
843 keep pair_id control_id logit_ps;
844 run;
NOTE: Variable __IDCO is uninitialized.
NOTE: Variable __CO1 is uninitialized.
NOTE: There were 0 observations read from the data set WORK.MATCHPAIRS.
NOTE: The data set WORK.CONTROL_MATCH has 0 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
845 /* Create a data set containing the matched TreoMel patients (treated subjects) */
846 data case_match;
847 set matchpairs;
848 case_id = __IDCA;
849 logit_ps = __CA1;
850 keep pair_id case_id logit_ps;
851 run;
NOTE: Variable __IDCA is uninitialized.
NOTE: Variable __CA1 is uninitialized.
NOTE: There were 0 observations read from the data set WORK.MATCHPAIRS.
NOTE: The data set WORK.CASE_MATCH has 0 observations and 3 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
852 proc sort data=control_match;
853 by control_id;
854 run;
NOTE: Input data set is empty.
NOTE: The data set WORK.CONTROL_MATCH has 0 observations and 3 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
855 proc sort data=case_match;
856 by case_id;
857 run;
NOTE: Input data set is empty.
NOTE: The data set WORK.CASE_MATCH has 0 observations and 3 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
858 data exposed;
859 set out_ps;
860 if TreoMel = 1;
861 case_id = id;
862 run;
NOTE: There were 1007 observations read from the data set WORK.OUT_PS.
NOTE: The data set WORK.EXPOSED has 91 observations and 96 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
863 data control;
864 set out_ps;
865 if TreoMel = 0;
866 control_id = id;
867 run;
NOTE: There were 1007 observations read from the data set WORK.OUT_PS.
NOTE: The data set WORK.CONTROL has 130 observations and 96 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
868 proc sort data=exposed;
869 by case_id;
870 run;
NOTE: There were 91 observations read from the data set WORK.EXPOSED.
NOTE: The data set WORK.EXPOSED has 91 observations and 96 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.02 seconds
cpu time 0.03 seconds
871 proc sort data=control;
872 by control_id;
873 run;
NOTE: There were 130 observations read from the data set WORK.CONTROL.
NOTE: The data set WORK.CONTROL has 130 observations and 96 variables.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
874 data control_match;
875 merge control_match (in=f1) control (in=f2);
876 by control_id;
877 if f1 and f2;
878 run;
NOTE: There were 0 observations read from the data set WORK.CONTROL_MATCH.
NOTE: There were 130 observations read from the data set WORK.CONTROL.
NOTE: The data set WORK.CONTROL_MATCH has 0 observations and 97 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
879 data case_match;
880 merge case_match (in=f1) exposed (in=f2);
881 by case_id;
882 if f1 and f2;
883 run;
NOTE: There were 0 observations read from the data set WORK.CASE_MATCH.
NOTE: There were 91 observations read from the data set WORK.EXPOSED.
NOTE: The data set WORK.CASE_MATCH has 0 observations and 97 variables.
NOTE: DATA statement used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
884 data long;
885 set control_match case_match;
886 prop_score = exp(logit_ps) / (exp(logit_ps) + 1);
887 run;
NOTE: There were 0 observations read from the data set WORK.CONTROL_MATCH.
NOTE: There were 0 observations read from the data set WORK.CASE_MATCH.
NOTE: The data set WORK.LONG has 0 observations and 99 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.02 seconds
888 /*
889 data wide_TreoMel;
890 set case_match;
891 death_1_yr_TreoMel = death_1_yr;
892 tvra_time_TreoMel = tvra_time;
893 tvra_TreoMel = tvra;
894 run;
895 data wide_bms;
896 set control_match;
897 death_1_yr_bms = death_1_yr;
898 tvra_time_bms = tvra_time;
899 tvra_bms = tvra;
900 run;
901 proc sort data=wide_TreoMel;
902 by pair_id;
903 run;
904 proc sort data=wide_bms;
905 by pair_id;
906 run;
907 /* Data set containing outcomes for the matched subjects. */
908 /* Each row contains outcomes for the treated and untreated subjects */
909 /* in the matched pair. */
910 /*data wide_combo;
911 merge wide_TreoMel (in=f1) wide_bms (in=f2);
912 by pair_id;
913 if f1 and f2;
914 run;
915 */
916
917 /******************************************************************************/
918 /* Compute standardized differences for each covariate in the matched sample. */
919 /******************************************************************************/
920 /*proc sort data=long;
921 by TreoMel;
922 run;*/
923 /******************************************************************************/
924 /* Macro for computing standardized differences for continuous variables. */
925 /******************************************************************************/
926 %macro cont(var=,label=);
927 proc means mean stddev data=long noprint;
928 var &var;
929 by TreoMel;
930 output out=outmean (keep = TreoMel mean stddev) mean = mean stddev=stddev;
931 run;
932 data TreoMel0;
933 set outmean;
934 if TreoMel = 0;
935 mean_0 = mean;
936 s_0 = stddev;
937 keep mean_0 s_0;
938 run;
939 data TreoMel1;
940 set outmean;
941 if TreoMel = 1;
942 mean_1 = mean;
943 s_1 = stddev;
944 keep mean_1 s_1;
945 run;
946 data newdata;
947 length label $ 25;
948 merge TreoMel0 TreoMel1;
949 d = round(abs(d),0.001);
950 label = &label;
951 keep d label;
952 run;
953 proc append data=newdata base=standiff force;
954 run;
955 %mend cont;
956 /******************************************************************************/
957 /* Macro for computing standardized differences for binary variables. */
958 /******************************************************************************/
959 %macro binary(var=,label=);
960 proc means mean data=long noprint;
961 var &var;
962 by TreoMel;
963 output out=outmean (keep = TreoMel mean) mean = mean;
964 run;
965 data TreoMel0;
966 set outmean;
967 if TreoMel = 0;
968 mean_0 = mean;
969 keep mean_0;
970 run;
971 data TreoMel1;
972 set outmean;
973 if TreoMel = 1;
974 mean_1 = mean;
975 keep mean_1;
976 run;
977 data newdata;
978 length label $ 25;
979 merge TreoMel0 TreoMel1;
980 d = (mean_1 - mean_0)/ sqrt((mean_1*(1-mean_1) + mean_0*(1-mean_0))/2);
981 d = round(abs(d),0.001);
982 label = &label;
983 keep d label;
984 run;
985 proc append data=newdata base=standiff force;
986 run;
987 %mend binary;
988 %cont(var=sex,label="sex");
NOTE: No observations in data set WORK.LONG.
NOTE: The data set WORK.OUTMEAN has 0 observations and 3 variables.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: There were 0 observations read from the data set WORK.OUTMEAN.
NOTE: The data set WORK.TREOMEL0 has 0 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds
NOTE: There were 0 observations read from the data set WORK.OUTMEAN.
NOTE: The data set WORK.TREOMEL1 has 0 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
NOTE: There were 0 observations read from the data set WORK.TREOMEL0.
NOTE: There were 0 observations read from the data set WORK.TREOMEL1.
NOTE: The data set WORK.NEWDATA has 0 observations and 2 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
NOTE: Appending WORK.NEWDATA to WORK.STANDIFF.
NOTE: There were 0 observations read from the data set WORK.NEWDATA.
NOTE: 0 observations added.
NOTE: The data set WORK.STANDIFF has 0 observations and 2 variables.
NOTE: PROCEDURE APPEND used (Total process time):
real time 0.00 seconds
cpu time 0.01 seconds
989 proc print data=standiff;
990 title 'Standardized differences in propensity score matched sample';
991 run;
NOTE: No observations in data set WORK.STANDIFF.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds
992
993 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
1005 I really have no idea where does the File WORK.NB.DATA come from. I`m grateful for your help.
... View more