I've been working on a list of linear regressions and have run into a problem when I am missing variables. Currently my code looks like this
proc reg data=work.danreg outest=work.regoutput edf;
'11/15/14'n:model lnBeansY0M10 = lnBeansY0M11/noprint;
'12/15/14'n:model lnBeansY0M10 = lnBeansY0M12/noprint;
'01/15/15'n:model lnBeansY0M10 = lnBeansY1M01/noprint;
'02/15/15'n:model lnBeansY0M10 = lnBeansY1M02/noprint;
'03/15/15'n:model lnBeansY0M10 = lnBeansY1M03/noprint;
'04/15/15'n:model lnBeansY0M10 = lnBeansY1M04/noprint;
'05/15/15'n:model lnBeansY0M10 = lnBeansY1M05/noprint;
'06/15/15'n:model lnBeansY0M10 = lnBeansY1M06/noprint;
'07/15/15'n:model lnBeansY0M10 = lnBeansY1M07/noprint;
'08/15/15'n:model lnBeansY0M10 = lnBeansY1M08/noprint;
'09/15/15'n:model lnBeansY0M10 = lnBeansY1M09/noprint;
'10/15/15'n:model lnBeansY0M10 = lnBeansY1M10/noprint;
'11/15/15'n:model lnBeansY0M10 = lnBeansY1M11/noprint;
'12/15/15'n:model lnBeansY0M10 = lnBeansY1M12/noprint;
'01/15/15'n:model lnBeansY0M10 = lnBeansY2M01/noprint;
'03/15/15'n:model lnBeansY0M10 = lnBeansY2M03/noprint;
'04/15/15'n:model lnBeansY0M10 = lnBeansY2M04/noprint;
'05/15/15'n:model lnBeansY0M10 = lnBeansY2M05/noprint;
'06/15/15'n:model lnBeansY0M10 = lnBeansY2M06/noprint;
'07/15/15'n:model lnBeansY0M10 = lnBeansY2M07/noprint;
This runs fine if all of the independent variables exist. But occasionally one of the independent variables doesn't get pulled in for a particular day (ex. if lnBeansY2M06 didn't exist) and the entire proc statement cancels out. Is there any way around this to just have the proc statement skip over regressions with missing variables?
I'm fairly sure it is up to you to ensure the variable exists and/or has non-missing values for independent variables in model statement.
If the variable doesn't exist then how does it end up in the code? if you are generating the code using a macro then you might want to add a step to skip creating that line when the variable doesn't exist.
I would transpose work.danreg to have one independent X variable and a BY variable with values INBEANSYnMmm.
Show example of WORK.DANREG.
My data set is
data work.danreg;
set prod_stg.&lastdat;
BeansY0M10 = S_00998_Y0_M10 / lag(S_00998_Y0_M10);
lnBeansY0M10 = log(BeansY0M10);
BeansY0M11 = S_00998_Y0_M11 / lag(S_00998_Y0_M11);
lnBeansY0M11 = log(BeansY0M11);
BeansY0M12 = S_00998_Y0_M12 / lag(S_00998_Y0_M12);
lnBeansY0M12 = log(BeansY0M12);
BeansY1M01 = S_00998_Y1_M01 / lag(S_00998_Y1_M01);
lnBeansY1M01 = log(BeansY1M01);
BeansY1M02= S_00998_Y1_M02 / lag(S_00998_Y1_M02);
lnBeansY1M02 = log(BeansY1M02);
BeansY1M03 = S_00998_Y1_M03 / lag(S_00998_Y1_M03);
lnBeansY1M03 = log(BeansY1M03);
BeansY1M04 = S_00998_Y1_M04 / lag(S_00998_Y1_M04);
lnBeansY1M04 = log(BeansY1M04);
BeansY1M05 = S_00998_Y1_M05 / lag(S_00998_Y1_M05);
lnBeansY1M05 = log(BeansY1M05);
BeansY1M06 = S_00998_Y1_M06 / lag(S_00998_Y1_M06);
lnBeansY1M06 = log(BeansY1M06);
BeansY1M07 = S_00998_Y1_M07 / lag(S_00998_Y1_M07);
lnBeansY1M07 = log(BeansY1M07);
BeansY1M08 = S_00998_Y1_M08 / lag(S_00998_Y1_M08);
lnBeansY1M08 = log(BeansY1M08);
BeansY1M09 = S_00998_Y1_M09 / lag(S_00998_Y1_M09);
lnBeansY1M09 = log(BeansY1M09);
BeansY1M10 = S_00998_Y1_M10 / lag(S_00998_Y1_M10);
lnBeansY1M10 = log(BeansY1M10);
BeansY1M11 = S_00998_Y1_M11 / lag(S_00998_Y1_M11);
lnBeansY1M11 = log(BeansY1M11);
BeansY1M12 = S_00998_Y1_M12 / lag(S_00998_Y1_M12);
lnBeansY1M12 = log(BeansY1M12);
BeansY2M01 = S_00998_Y2_M01 / lag(S_00998_Y2_M01);
lnBeansY2M01 = log(BeansY2M01);
BeansY2M03 = S_00998_Y2_M03 / lag(S_00998_Y2_M03);
lnBeansY2M03 = log(BeansY2M03);
BeansY2M04 = S_00998_Y2_M04 / lag(S_00998_Y2_M04);
lnBeansY2M04 = log(BeansY2M04);
BeansY2M05 = S_00998_Y2_M05 / lag(S_00998_Y2_M05);
lnBeansY2M05 = log(BeansY2M05);
BeansY2M06 = S_00998_Y2_M06 / lag(S_00998_Y2_M06);
lnBeansY2M06 = log(BeansY2M06);
BeansY2M07 = S_00998_Y2_M07 / lag(S_00998_Y2_M07);
lnBeansY2M07 = log(BeansY2M07);
I wanted to a representative sample of the rows and columns in work.DANREG but I think I would rather see a sample of the row and columns in prod_stg.&lastdat maybe both.
I have attached an old data file I use for my regression and will paste my entire code below. What I'd like to do is to take the natural log of the daily change for every month but have the proc statement ignore months that do not come through in the data set.
%let lastdat=;
/*Macro that automatically pulls in most recent data*/
proc sql noprint;
select memname into :lastdat from dictionary.tables where
upcase(libname) = 'PROD_STG' and upcase(memname) like 'PRICES_TRANSPOSE_%'
order by input(scan(memname,-1,'_'),date.) desc;
quit;
/*taking the natural log of the daily change for each commodity month*/
data work.danreg;
set prod_stg.&lastdat;
cornY0M10 = CYELLOW_00998_Y0_M10 / lag(CYELLOW_00998_Y0_M10);
lncornY0M10 = log(cornY0M10);
cornY0M11 = CYELLOW_00998_Y0_M11 / lag(CYELLOW_00998_Y0_M11);
lncornY0M11 = log(cornY0M11);
cornY0M12 = CYELLOW_00998_Y0_M12 / lag(CYELLOW_00998_Y0_M12);
lncornY0M12 = log(cornY0M12);
cornY1M01 = CYELLOW_00998_Y1_M01 / lag(CYELLOW_00998_Y1_M01);
lncornY1M01 = log(cornY1M01);
cornY1M02= CYELLOW_00998_Y1_M02 / lag(CYELLOW_00998_Y1_M02);
lncornY1M02 = log(cornY1M02);
cornY1M03 = CYELLOW_00998_Y1_M03 / lag(CYELLOW_00998_Y1_M03);
lncornY1M03 = log(cornY1M03);
cornY1M04 = CYELLOW_00998_Y1_M04 / lag(CYELLOW_00998_Y1_M04);
lncornY1M04 = log(cornY1M04);
cornY1M05 = CYELLOW_00998_Y1_M05 / lag(CYELLOW_00998_Y1_M05);
lncornY1M05 = log(cornY1M05);
cornY1M06 = CYELLOW_00998_Y1_M06 / lag(CYELLOW_00998_Y1_M06);
lncornY1M06 = log(cornY1M06);
cornY1M07 = CYELLOW_00998_Y1_M07 / lag(CYELLOW_00998_Y1_M07);
lncornY1M07 = log(cornY1M07);
cornY1M08 = CYELLOW_00998_Y1_M08 / lag(CYELLOW_00998_Y1_M08);
lncornY1M08 = log(cornY1M08);
cornY1M09 = CYELLOW_00998_Y1_M09 / lag(CYELLOW_00998_Y1_M09);
lncornY1M09 = log(cornY1M09);
cornY1M10 = CYELLOW_00998_Y1_M10 / lag(CYELLOW_00998_Y1_M10);
lncornY1M10 = log(cornY1M10);
cornY1M11 = CYELLOW_00998_Y1_M11 / lag(CYELLOW_00998_Y1_M11);
lncornY1M11 = log(cornY1M11);
cornY1M12 = CYELLOW_00998_Y1_M12 / lag(CYELLOW_00998_Y1_M12);
lncornY1M12 = log(cornY1M12);
cornY2M01 = CYELLOW_00998_Y2_M01 / lag(CYELLOW_00998_Y2_M01);
lncornY2M01 = log(cornY2M01);
cornY2M03 = CYELLOW_00998_Y2_M03 / lag(CYELLOW_00998_Y2_M03);
lncornY2M03 = log(cornY2M03);
cornY2M04 = CYELLOW_00998_Y2_M04 / lag(CYELLOW_00998_Y2_M04);
lncornY2M04 = log(cornY2M04);
cornY2M05 = CYELLOW_00998_Y2_M05 / lag(CYELLOW_00998_Y2_M05);
lncornY2M05 = log(cornY2M05);
cornY2M06 = CYELLOW_00998_Y2_M06 / lag(CYELLOW_00998_Y2_M06);
lncornY2M06 = log(cornY2M06);
cornY2M07 = CYELLOW_00998_Y2_M07 / lag(CYELLOW_00998_Y2_M07);
lncornY2M07 = log(cornY2M07);
cornY2M08 = CYELLOW_00998_Y2_M08 / lag(CYELLOW_00998_Y2_M08);
lncornY2M08 = log(cornY2M08);
cornY2M10 = CYELLOW_00998_Y2_M10 / lag(CYELLOW_00998_Y2_M10);
lncornY2M10 = log(cornY2M10);
run;
/*taking numerous single linear regressions of the front month
against each subsequent month*/
proc reg data=work.danreg outest=work.regoutput edf;
'11/15/14'n:model lncornY0M10 = lncornY0M11/noprint;
'12/15/14'n:model lncornY0M10 = lncornY0M12/noprint;
'01/15/15'n:model lncornY0M10 = lncornY1M01/noprint;
'02/15/15'n:model lncornY0M10 = lncornY1M02/noprint;
'03/15/15'n:model lncornY0M10 = lncornY1M03/noprint;
'04/15/15'n:model lncornY0M10 = lncornY1M04/noprint;
'05/15/15'n:model lncornY0M10 = lncornY1M05/noprint;
'06/15/15'n:model lncornY0M10 = lncornY1M06/noprint;
'07/15/15'n:model lncornY0M10 = lncornY1M07/noprint;
'08/15/15'n:model lncornY0M10 = lncornY1M08/noprint;
'09/15/15'n:model lncornY0M10 = lncornY1M09/noprint;
'10/15/15'n:model lncornY0M10 = lncornY1M10/noprint;
'11/15/15'n:model lncornY0M10 = lncornY1M11/noprint;
'12/15/15'n:model lncornY0M10 = lncornY1M12/noprint;
'01/15/15'n:model lncornY0M10 = lncornY2M01/noprint;
'03/15/15'n:model lncornY0M10 = lncornY2M03/noprint;
'04/15/15'n:model lncornY0M10 = lncornY2M04/noprint;
'05/15/15'n:model lncornY0M10 = lncornY2M05/noprint;
'06/15/15'n:model lncornY0M10 = lncornY2M06/noprint;
'07/15/15'n:model lncornY0M10 = lncornY2M07/noprint;
/*'08/15/15'n:model lncornY0M10 = lncornY2M08/noprint;
this is inactive since it stops the proc step*/
'10/15/15'n:model lncornY0M10 = lncornY2M10/noprint;
run;
data want;
set work.regoutput;
coefficient=coalesce(of lncornY0M11, lncornY0M12, lncornY1M01, lncornY1M02, lncornY1M03, lncornY1M04, lncornY1M05,
lncornY1M06, lncornY1M07, lncornY1M08, lncornY1M09, lncornY1M10, lncornY1M11, lncornY1M12, lncornY2M01, lncornY2M03,
lncornY2M04, lncornY2M05, lncornY2M06, lncornY2M07, lncornY2M08, lncornY2M10);
drop lncornY0M11 lncornY0M12 lncornY1M01 lncornY1M02 lncornY1M03 lncornY1M04 lncornY1M05
lncornY1M06 lncornY1M07 lncornY1M08 lncornY1M09 lncornY1M10 lncornY1M11 lncornY1M12 lncornY2M01 lncornY2M03
lncornY2M04 lncornY2M05 lncornY2M06 lncornY2M07 lncornY2M08 lncornY2M10;
run;
proc print;
VAR _MODEL_ Intercept _RSQ_ coefficient;
Run;
Is there a macro I'd be able to use that would create the natural log of the daily change (currently lnBeans) by what variables are available in the dataset? something like if S_00998_Y0M10 exists create lnBeansY0M10, but have it start at Y0M01 and go though Y2M12? Y0 is current year and M stands for the month within that year if that's any help.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.