Hello everyone,
I'm a beginner in SAS and would like to do 5 linear regressions by using only one model, do loops and macro.
My independant variable is always the same but my dependant variable has to change each time: f1 f2 f3 f4 f5.
The trouble is I have only 1 observation instead of 5 as results.
Could you please tell me where my mistakes are ?
1510 /***M1***/
1511 options notes;
1512 options mprint;
1513
1514 %macro RunReg(DSName, NumVars);
1515 options nonotes; /* prevents the SAS log from overflowing */
1516 proc datasets nolist;
1517 delete AllStats; /* delete this data set if it exists */
1518 run;
1519
1520 %do i = 1 %to 5; /* repeat for each f&i */
1521 proc reg data=test noprint
1522 outest=PE(rename=(f&i=Value)); /* save parameter estimates */
1523 model f&i = rm; /* model f&i = rm */
1524 quit;
1525
1526 /* use PROC APPEND to accumulate statistics */
1527 proc append base=AllStats data=PE; run;
1528 %end;
1529 options notes;
1530 %mend;
1531 PROC PRINT DATA=PE;
1532 RUN;
Thanks for your help !!!
@pmorel3 wrote:
Hello everyone,
I'm a beginner in SAS and would like to do 5 linear regressions by using only one model, do loops and macro.
My independant variable is always the same but my dependant variable has to change each time: f1 f2 f3 f4 f5.
The trouble is I have only 1 observation instead of 5 as results.
Could you please tell me where my mistakes are ?
1510 /***M1***/ 1511 options notes; 1512 options mprint; 1513 1514 %macro RunReg(DSName, NumVars); 1515 options nonotes; /* prevents the SAS log from overflowing */ 1516 proc datasets nolist; 1517 delete AllStats; /* delete this data set if it exists */ 1518 run; 1519 1520 %do i = 1 %to 5; /* repeat for each f&i */ 1521 proc reg data=test noprint 1522 outest=PE(rename=(f&i=Value)); /* save parameter estimates */ 1523 model f&i = rm; /* model f&i = rm */ 1524 quit; 1525 1526 /* use PROC APPEND to accumulate statistics */ 1527 proc append base=AllStats data=PE; run; 1528 %end; 1529 options notes; 1530 %mend; 1531 PROC PRINT DATA=PE; 1532 RUN;
Thanks for your help !!!
Since you are using proc Reg, which will allow multiple models, there is no loop of any type. Provide a name for each model and that's all that is needed.
Here is an example:
proc reg data=sashelp.class outest=work.est; modelheight: model height=age; modelweight: model weight=age; run; quit;
The label word and : before the model names the output.
Note that the outset data set has each row of estimates labeled with the model name.
Thanks ballardw,
I tried here to simplify the case but If I had like 2000 dependant variables: f1 f2 f3 f4 f5...f1000 and I didn't want to provide a name for all my 2000 models, would a loop be suitable to use ? If so, how could i do ?
@pmorel3 wrote:
Thanks ballardw,
I tried here to simplify the case but If I had like 2000 dependant variables: f1 f2 f3 f4 f5...f1000 and I didn't want to provide a name for all my 2000 models, would a loop be suitable to use ? If so, how could i do ?
Restructure the data so that instead of having 1000 variables you have 2: a label and a value:
data example; input x y1-y5; datalines; 1 1 2 3 4 5 2 2 5 7 8 9 3 2.5 6 10 13 14.7 ; data work.model; set example; array y y1-y5; do i=1 to dim(y); label=vname(y[i]); value=y[i]; output; end; keep x label value; run; proc sort data=work.model; by label; run; proc reg data=work.model outest=work.est ; by label; model value=x; run; quit;
The output will have the LABEL value to indicate the model row.
I used the data step to transpose the data instead of proc transpose as you may have multiple observations with the same independent value and proc transpose would create multiple "value" columns if that is the case which would not hep modeling.
Thank you so much ballardw. It works !!!
For each linear regression, I would like to compute automatically, by using my regression coeffcients, this variable: alphaM1 = rp - rm*(reg coef of rm)
without writing by hand my coefficients, what should I do ?
Obviously, I will have 5 series of ''alphaM1''
@pmorel3 wrote:
Thank you so much ballardw. It works !!!
For each linear regression, I would like to compute automatically, by using my regression coeffcients, this variable: alphaM1 = rp - rm*(reg coef of rm)
without writing by hand my coefficients, what should I do ?
Obviously, I will have 5 series of ''alphaM1''
Where does RP come from? Are the values in a data set?
Are you worried abou tthe intercept coefficient in what you are doing here?
And why 5 series of AlphaM1? You mentioned thousands of Y variables but nothing about 5 series of anything. Or are you just paralleling my example with 5 Y variables?
You can evaluate all of the values rp, what ever that comes from and x if they are in a data set. To extend our previous example:
First create a data set with the X values to evaluate the equation at. If RP needs to be set this you can add it to this data set.
data work.score_x; input x; datalines; .75 1.2 2.4 2.8 3.3 ;
Combine with the model parameter estimates. Since you appear to need to apply the same values to ALL of the models then Proc SQL is going to be the tool to combine the sets and do the calculation:
proc sql; create table work.scored as select a.label,a.x as xparm,b.x, (3 - a.x*b.x) as Alpha from work.est as a, work.score_x as b order by a.label, b.x ; quit;
This maintains the Label variable so you know which model was used and adds the the parameter for X from the model estimate results, has the X value used for evaluation and the calculation result.
If RP was the model estimate you would use a.intercept in the place of the 3 above. If it is another variable in the work.score_x data set then use B.RP instead of the 3.
The order by clause means the output will be sorted by the label and the values of x.
The "work.est as a" establishes an alias so you can reference which data set a specific variable comes from. See the A.X and B.X, same variable name on input.
You will get a message in the Log:
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
because what you are doing is combining every row in each set with every row in the other. This is often referred to as a Cartesian Product.
Now if the purpose of your equation is to get the predicted Y values for X values that you are interested in but do not have Y values directly then you could use the STORE statement to create a special format set that can be used by another procedure to do that evaluation:
proc reg data=work.model outest=work.est ; by label; model value=x; store out=work.store; run; quit; data work.score_x; input x; datalines; .75 1.2 2.4 2.8 3.3 ; proc plm restore=work.store ; score data=work.score_x out=work.scoreresults; run;
Since the work.score_x does not have any specific value of by variables then it evaluates all of the models.
If you were interested just some of the "label" values for specific X values then you would add the appropriate value with the same name as it appeared in Proc Reg. If your by variable is character, then make sure the case is the same, Y1 wouldn't match y1 in the model.
Example:
data work.score_x; input x label $; datalines; .75 y1 1.2 y3 2.4 y1 2.8 y2 3.3 y1 ;
This ability to create a stored model parameter this way is extremely powerful. Create the Store set in a permanent library and then any time you want to reuse the model for scoring other data as questions arise.
This is my code after it worked the first time:
/***M1 obtenir les coeff de M1 dans une seule base de données pour les 5 fonds***/
data work.model;
set test;
array f f1-f5;
do i=1 to dim(f);
label=vname(f[i]);
value=f[i];
output;
end;
keep rm zdy ztbl ztms zdfy rmrsq rm_zdy rm_ztbl rm_ztms rm_zdfy pred_mean rm_pred_mean label value;
run;
proc sort data=work.model;
by label;
run;
proc reg data=work.model
outest=work.estM1
;
by label;
model value=rm;
run;
quit;
PROC PRINT DATA=work.estM1;
RUN;
Now to answer your last comment:
I'm sorry but it should be obviously: "alphaM1 = Value - rm*(reg coef of rm)".
You referenced "value" before as the main model "model value=rm"
Yes, I am interested in intercept coefficient.
Yes, I was just paralleling your example with 5 Y variables.
Let me just try what you explained and I'll be back !
I used your code and there is no error message.
However, I don't understand why my alphaM1 is the same all over the time for each label. Indeed, I try to compute AlphaM1 = (value - rm*reg coeff of rm) for each label. As "value" correspond with f1 to f5 (or f1000, if we expand our example ;)) and "rm" is the same for each label, I should have 5 time series of AlphaM1. In other words, I should have 5 variables of AlphaM1 for each label (f1 to f5). It's not the case here.
Added to that, how can I present the 5 time series of AlphaM1 side by side in order to have only 360 observations instead of 1800 (360*5)?
options notes;
options mprint;
/***M1 obtenir les coeff de M1 dans une seule base de données pour les 5 fonds***/
data work.model;
set test;
array f f1-f5;
do i=1 to dim(f);
label=vname(f[i]);
value=f[i];
output;
end;
keep rm zdy ztbl ztms zdfy rmrsq rm_zdy rm_ztbl rm_ztms rm_zdfy pred_mean rm_pred_mean label value;
run;
proc sort data=work.model;
by label;
run;
proc reg data=work.model
outest=work.estM1
;
by label;
model value=rm;
run;
quit;
PROC PRINT DATA=work.estM1;
RUN;
/***M1: obtenir les 5 séries de alphaM1 dans une seule base de données***/
data work.score_x;
set test;
keep rm zdy ztbl ztms zdfy rmrsq rm_zdy rm_ztbl rm_ztms rm_zdfy pred_mean rm_pred_mean;
RUN;
proc sql;
create table work.scoredM1 as
select a.label,a.rm as xparm,b.rm,
(value - a.rm) as AlphaM1
from work.estM1 as a, work.score_x as b
order by a.label, b.rm
;
quit;
RUN;
proc print data=work.scoredM1;
run;
How can I adapt theses codes if RP is f1 to f5 (or f1 to ...f1000)?
Indeed, I would like to compute automatically 5 (1000) different time series:
alphaM1f1 = f1 - (rm*reg coef of rm in f1)
alphaM1f2 = f2 - (rm*reg coef of rm in f2)
alphaM1f3 = f3 - (rm*reg coef of rm in f3)
alphaM1f4 = f4 - (rm*reg coef of rm in f4)
alphaM1f5 = f5 - (rm*reg coef of rm in f5)
...
alphaM1f1000 = f1000 - (rm*reg coef of rm in f1000)
When I ran these codes, I noticed 2 weird things even though I don't have any error message:
1- "Value" I've specified in line 1687 is not considered in line 1717 anymore. Indeed, as I didn't have expected results in my output: line 1724 (I try to calculate AlphaM1 of f1 by excel and I did not have the right time serie), I tried to refind the value of "value" mentionned in my proc sql (line 1717) and each time, I found -1 as value of "Value" instead of f[i].
Can someone explain me why ?
2- "rm" I've kept in line 1711 is not the same I discovered in my proc print (line 1724). Can someone explain me why ?
1678 options notes;
1679 options mprint;
1680 /***M1 obtenir les coeff de M1 dans une seule base de données pour les 5 fonds***/
1681
1682 data work.model;
1683 set test;
1684 array f f1-f5;
1685 do i=1 to dim(f);
1686 label=vname(f[i]);
1687 value=f[i];
1688 output;
1689 end;
1690 keep rm zdy ztbl ztms zdfy rmrsq rm_zdy rm_ztbl rm_ztms rm_zdfy pred_mean rm_pred_mean label value;
1691 run;
NOTE: There were 360 observations read from the data set WORK.TEST.
NOTE: The data set WORK.MODEL has 1800 observations and 14 variables.
NOTE: DATA statement a utilisé (Durée totale du traitement) :
real time 0.01 seconds
cpu time 0.03 seconds
1692
1693 proc sort data=work.model;
1694 by label;
1695 run;
NOTE: There were 1800 observations read from the data set WORK.MODEL.
NOTE: The data set WORK.MODEL has 1800 observations and 14 variables.
NOTE: PROCEDURE SORT a utilisé (Durée totale du traitement) :
real time 0.00 seconds
cpu time 0.03 seconds
1696
1697 proc reg data=work.model
1698 outest=work.estM1
1699 ;
1700 by label;
1701 model value=rm;
1702 run;
NOTE: Interactivity disabled with BY processing.
NOTE: PROCEDURE REG a utilisé (Durée totale du traitement) :
real time 3.08 seconds
cpu time 1.45 seconds
NOTE: The data set WORK.ESTM1 has 5 observations and 8 variables.
1703 quit;
1704
1705 PROC PRINT DATA=work.estM1;
1706 RUN;
NOTE: There were 5 observations read from the data set WORK.ESTM1.
NOTE: PROCEDURE PRINT a utilisé (Durée totale du traitement) :
real time 0.00 seconds
cpu time 0.00 seconds
1707 /***M1: obtenir les 5 séries de alphaM1 dans une seule base de données***/
1708
1709 data work.score_x;
1710 set test;
1711 keep rm zdy ztbl ztms zdfy rmrsq rm_zdy rm_ztbl rm_ztms rm_zdfy pred_mean rm_pred_mean;
1712 RUN;
NOTE: There were 360 observations read from the data set WORK.TEST.
NOTE: The data set WORK.SCORE_X has 360 observations and 12 variables.
NOTE: DATA statement a utilisé (Durée totale du traitement) :
real time 0.00 seconds
cpu time 0.00 seconds
1713 proc sql;
1714 create table work.scoredM1 as
1715 select a.label, b.rm, a.rm as rm_parm,
1716 (b.rm*a.rm) as scndtrm,
1717 (value - (b.rm*a.rm)) as AlphaM1
1718 from work.estM1 as a, work.score_x as b
1719 order by a.label, b.rm
1720 ;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
NOTE: Table WORK.SCOREDM1 created, with 1800 rows and 5 columns.
1721 quit;
NOTE: PROCEDURE SQL a utilisé (Durée totale du traitement) :
real time 0.01 seconds
cpu time 0.01 seconds
1722 RUN;
1723
1724 proc print data=work.scoredM1;
1725 run;
NOTE: There were 1800 observations read from the data set WORK.SCOREDM1.
NOTE: PROCEDURE PRINT a utilisé (Durée totale du traitement) :
real time 0.25 seconds
cpu time 0.25 seconds
1726
1727 proc print data=work.score_x;
1728 run;
NOTE: There were 360 observations read from the data set WORK.SCORE_X.
NOTE: PROCEDURE PRINT a utilisé (Durée totale du traitement) :
real time 0.10 seconds
cpu time 0.09 seconds
?
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.