Hello fellow SAS Users, Hopefully someone is able to help. It's not direly urgent as I have written the code and it works fine. But, I need to repeat this code for 3 similar tests on 20-30 different sample combinations, each combination having 10s of millions of observations. So, it could get rather lengthy. The code is below, and here is (esentially) the steps that need to be followed: *POOLED PRE/POST FOR TREATMENT AND BENCHMARK FIRMS* = Combining necessary datasets with various sample partitions (in this case, four), based on a bootstrap procedure that generated 1,000 random samples *RUN REGRESSION AND OBTAIN RESIDUALS FOR CHG IN NI* = The tests mentioned use the residuals from the OLS Regression model. The regressions are run by replicate (from the bootstrap). *OBTAIN VARIANCES* = For each regression and each sample partition, I obtain the variance of the residuals. Since there are 1,000 replicates, this means there are 1,000 variances for each sample partition. *TEST FOR PRE/POST DIFFERENCE IN VARIANCES* = t-test for difference in the means of the 1,000 variances between pre/post for each of the two types of firms. *TO CALCULATE DIFFERENCE-IN-DIFFERENCES* = Final step, calculating the difference for each of the above 1,000 pre/post combinations, then testing for difference in the differences. ODS HTML CLOSE; /*CLOSE PREVIOUS*/
ODS HTML; /*OPEN NEW*/
*POOLED PRE/POST FOR TREATMENT AND BENCHMARK FIRMS*;;
PROC SORT DATA=PRETR_R; BY DSCD FYEAR; RUN; PROC SORT DATA=POSTTR_R; BY DSCD FYEAR; RUN;
PROC SORT DATA=PREBN_R; BY DSCD FYEAR; RUN; PROC SORT DATA=POSTBN_R; BY DSCD FYEAR; RUN;
DATA POOLED1;
MERGE PRETR_R POSTTR_R PREBN_R POSTBN_R;
BY DSCD FYEAR;
RUN;
proc sort data=pooled1; by replicate; run;
*TEST 1 -- FULL SAMPLE*;
*RUN REGRESSION AND OBTAIN RESIDUALS FOR CHG IN NI;;
PROC REG DATA=pooled1 PLOTS PLOTS=NONE NOPRINT ;
BY REPLICATE;
MODEL NI_D = GROWTH EISSUE LEV DISSUE TURN SIZE CF_S NUMEX XLIST CLOSE ROL;
OUTPUT OUT=POOLEDTR_REG1
R=NI_DRESID;
RUN;
*OBTAIN VARIANCES*;;
PROC SORT DATA=POOLEDTR_REG1; BY POST TYPE2 REPLICATE; RUN;
PROC MEANS NOPRINT DATA=POOLEDTR_REG1 VAR;
BY POST TYPE2 REPLICATE;
VAR NI_DRESID;
OUTPUT OUT=POOLEDTR_NI VAR= NI_VAR ;
RUN;
*TEST FOR PRE/POST DIFFERENCE IN VARIANCES*;;
PROC SORT DATA=POOLEDTR_NI; BY TYPE2; RUN;
PROC TTEST DATA=POOLEDTR_NI PLOTS=NONE;
VAR NI_VAR;
CLASS post ;
by type2;
RUN;
*TO CALCULATE AND TEST DIFFERENCE-IN-DIFFERENCES*;;
DATA POOLEDTR_NI2;
SET POOLEDTR_NI;
BY TYPE2 POST REPLICATE;
LAG = LAG1000(NI_VAR);
DIFF = NI_VAR-LAG;
IF POST=0 THEN DELETE;
RUN;
PROC TTEST DATA=POOLEDTR_NI2 PLOTS=NONE;
VAR DIFF;
CLASS TYPE2 ;
RUN; As I said, this code WORKS. My concern is that it is not EFFICIENT. Considering how many data sets this one test creates, combined with having to do this another 20-30 times..... So, if anyone can help simplify or consolidate this code, I would greatly appreciate it. Thanks in advance!
... View more