Re: There were 1 observations read from the data set WORK.PE

pmorel3 · Posted 02-06-2020 10:28 AM

Hello everyone,

I'm a beginner in SAS and would like to do 5 linear regressions by using only one model, do loops and macro.

My independant variable is always the same but my dependant variable has to change each time: f1 f2 f3 f4 f5.

The trouble is I have only 1 observation instead of 5 as results.

Could you please tell me where my mistakes are ?


1510  /***M1***/
1511  options notes;
1512  options mprint;
1513
1514  %macro RunReg(DSName, NumVars);
1515  options nonotes;     /* prevents the SAS log from overflowing */
1516  proc datasets nolist;
1517    delete AllStats;   /* delete this data set if it exists */
1518  run;
1519
1520  %do i = 1 %to 5;         /* repeat for each f&i */
1521     proc reg data=test noprint
1522              outest=PE(rename=(f&i=Value)); /* save parameter estimates */
1523     model f&i = rm;               /* model f&i = rm */
1524     quit;
1525
1526     /* use PROC APPEND to accumulate statistics */
1527     proc append base=AllStats data=PE; run;
1528  %end;
1529  options notes;
1530  %mend;

1531    PROC PRINT DATA=PE;
1532    RUN;

Thanks for your help !!!

ballardw · Posted 02-06-2020 10:48 AM

@pmorel3 wrote:

Hello everyone,

I'm a beginner in SAS and would like to do 5 linear regressions by using only one model, do loops and macro.

My independant variable is always the same but my dependant variable has to change each time: f1 f2 f3 f4 f5.

The trouble is I have only 1 observation instead of 5 as results.

Could you please tell me where my mistakes are ?
1510  /***M1***/
1511  options notes;
1512  options mprint;
1513
1514  %macro RunReg(DSName, NumVars);
1515  options nonotes;     /* prevents the SAS log from overflowing */
1516  proc datasets nolist;
1517    delete AllStats;   /* delete this data set if it exists */
1518  run;
1519
1520  %do i = 1 %to 5;         /* repeat for each f&i */
1521     proc reg data=test noprint
1522              outest=PE(rename=(f&i=Value)); /* save parameter estimates */
1523     model f&i = rm;               /* model f&i = rm */
1524     quit;
1525
1526     /* use PROC APPEND to accumulate statistics */
1527     proc append base=AllStats data=PE; run;
1528  %end;
1529  options notes;
1530  %mend;

1531    PROC PRINT DATA=PE;
1532    RUN;
Thanks for your help !!!

Since you are using proc Reg, which will allow multiple models, there is no loop of any type. Provide a name for each model and that's all that is needed.

Here is an example:

proc reg data=sashelp.class outest=work.est;
   modelheight: model height=age;
   modelweight: model weight=age; 
run;
quit;

The label word and : before the model names the output.

Note that the outset data set has each row of estimates labeled with the model name.

pmorel3 · Posted 02-06-2020 11:00 AM

Thanks ballardw,

I tried here to simplify the case but If I had like 2000 dependant variables: f1 f2 f3 f4 f5...f1000 and I didn't want to provide a name for all my 2000 models, would a loop be suitable to use ? If so, how could i do ?

ballardw · Posted 02-06-2020 11:52 AM

@pmorel3 wrote:

Thanks ballardw,

I tried here to simplify the case but If I had like 2000 dependant variables: f1 f2 f3 f4 f5...f1000 and I didn't want to provide a name for all my 2000 models, would a loop be suitable to use ? If so, how could i do ?

Restructure the data so that instead of having 1000 variables you have 2: a label and a value:

data example;
    input x y1-y5;
datalines;
1 1 2 3 4 5
2 2 5 7 8 9
3 2.5 6 10 13 14.7
;

data work.model;
   set example;
   array y y1-y5;
   do i=1 to dim(y);
      label=vname(y[i]);
      value=y[i];
      output;
   end;
   keep x label value;
run;

proc sort data=work.model;
   by label;
run;

proc reg data=work.model
      outest=work.est
     ;
   by label;
   model value=x;
run;
quit;

The output will have the LABEL value to indicate the model row.

I used the data step to transpose the data instead of proc transpose as you may have multiple observations with the same independent value and proc transpose would create multiple "value" columns if that is the case which would not hep modeling.

ballardw · Posted 02-06-2020 01:48 PM

Thank you so much ballardw. It works !!!

For each linear regression, I would like to compute automatically, by using my regression coeffcients, this variable: alphaM1 = rp - rm*(reg coef of rm)

without writing by hand my coefficients, what should I do ?

Obviously, I will have 5 series of ''alphaM1''

ballardw · Posted 02-06-2020 06:22 PM

@pmorel3 wrote:

Thank you so much ballardw. It works !!!

For each linear regression, I would like to compute automatically, by using my regression coeffcients, this variable: alphaM1 = rp - rm*(reg coef of rm)

without writing by hand my coefficients, what should I do ?

Obviously, I will have 5 series of ''alphaM1''

Where does RP come from? Are the values in a data set?

Are you worried abou tthe intercept coefficient in what you are doing here?

And why 5 series of AlphaM1? You mentioned thousands of Y variables but nothing about 5 series of anything. Or are you just paralleling my example with 5 Y variables?

You can evaluate all of the values rp, what ever that comes from and x if they are in a data set. To extend our previous example:

First create a data set with the X values to evaluate the equation at. If RP needs to be set this you can add it to this data set.

data work.score_x;
   input x;
datalines;
.75
1.2
2.4
2.8
3.3
;

Combine with the model parameter estimates. Since you appear to need to apply the same values to ALL of the models then Proc SQL is going to be the tool to combine the sets and do the calculation:

proc sql;
   create table work.scored as
   select a.label,a.x as xparm,b.x,
          (3 - a.x*b.x) as Alpha
   from work.est as a, work.score_x as b
   order by a.label, b.x
   ;
quit;

This maintains the Label variable so you know which model was used and adds the the parameter for X from the model estimate results, has the X value used for evaluation and the calculation result.

If RP was the model estimate you would use a.intercept in the place of the 3 above. If it is another variable in the work.score_x data set then use B.RP instead of the 3.

The order by clause means the output will be sorted by the label and the values of x.

The "work.est as a" establishes an alias so you can reference which data set a specific variable comes from. See the A.X and B.X, same variable name on input.

You will get a message in the Log:

NOTE: The execution of this query involves performing one or more Cartesian product joins that
      can not be optimized.

because what you are doing is combining every row in each set with every row in the other. This is often referred to as a Cartesian Product.

Now if the purpose of your equation is to get the predicted Y values for X values that you are interested in but do not have Y values directly then you could use the STORE statement to create a special format set that can be used by another procedure to do that evaluation:

proc reg data=work.model
      outest=work.est
     ;
   by label;
   model value=x;
   store out=work.store;
run;
quit;

data work.score_x;
   input x;
datalines;
.75
1.2
2.4
2.8
3.3
;

proc plm restore=work.store ;
   score data=work.score_x out=work.scoreresults;
run;

Since the work.score_x does not have any specific value of by variables then it evaluates all of the models.

If you were interested just some of the "label" values for specific X values then you would add the appropriate value with the same name as it appeared in Proc Reg. If your by variable is character, then make sure the case is the same, Y1 wouldn't match y1 in the model.

Example:

data work.score_x;
   input x label $;
datalines;
.75  y1
1.2  y3
2.4  y1
2.8  y2
3.3  y1
;

This ability to create a stored model parameter this way is extremely powerful. Create the Store set in a permanent library and then any time you want to reuse the model for scoring other data as questions arise.

pmorel3 · Posted 02-06-2020 06:43 PM

This is my code after it worked the first time:

/***M1 obtenir les coeff de M1 dans une seule base de données pour les 5 fonds***/

data work.model;
   set test;
   array f f1-f5;
   do i=1 to dim(f);
      label=vname(f[i]);
      value=f[i];
      output;
   end;
   keep rm zdy ztbl ztms zdfy rmrsq rm_zdy rm_ztbl rm_ztms rm_zdfy pred_mean rm_pred_mean label value;
run;

proc sort data=work.model;
   by label;
run;

proc reg data=work.model
      outest=work.estM1
     ;
   by label;
   model value=rm;
run;
quit;

  PROC PRINT DATA=work.estM1;
  RUN;

Now to answer your last comment:

I'm sorry but it should be obviously: "alphaM1 = Value - rm*(reg coef of rm)".

You referenced "value" before as the main model "model value=rm"

Yes, I am interested in intercept coefficient.

Yes, I was just paralleling your example with 5 Y variables.

Let me just try what you explained and I'll be back !

pmorel3 · Posted 02-06-2020 07:57 PM

I used your code and there is no error message.

However, I don't understand why my alphaM1 is the same all over the time for each label. Indeed, I try to compute AlphaM1 = (value - rm*reg coeff of rm) for each label. As "value" correspond with f1 to f5 (or f1000, if we expand our example ;)) and "rm" is the same for each label, I should have 5 time series of AlphaM1. In other words, I should have 5 variables of AlphaM1 for each label (f1 to f5). It's not the case here.

Added to that, how can I present the 5 time series of AlphaM1 side by side in order to have only 360 observations instead of 1800 (360*5)?

options notes;
options mprint;
/***M1 obtenir les coeff de M1 dans une seule base de données pour les 5 fonds***/

data work.model;
   set test;
   array f f1-f5;
   do i=1 to dim(f);
      label=vname(f[i]);
      value=f[i];
      output;
   end;
   keep rm zdy ztbl ztms zdfy rmrsq rm_zdy rm_ztbl rm_ztms rm_zdfy pred_mean rm_pred_mean label value;
run;

proc sort data=work.model;
   by label;
run;

proc reg data=work.model
      outest=work.estM1
     ;
   by label;
   model value=rm;
run;
quit;

  PROC PRINT DATA=work.estM1;
  RUN; 
 /***M1: obtenir les 5 séries de alphaM1 dans une seule base de données***/


data work.score_x;
   set test;
   keep rm zdy ztbl ztms zdfy rmrsq rm_zdy rm_ztbl rm_ztms rm_zdfy pred_mean rm_pred_mean;
  RUN;
proc sql;
   create table work.scoredM1 as
   select a.label,a.rm as xparm,b.rm,
          (value - a.rm) as AlphaM1
   from work.estM1 as a, work.score_x as b
   order by a.label, b.rm
   ;
quit;
 RUN;

 proc print data=work.scoredM1;
 run;

pmorel3 · Posted 02-07-2020 09:45 AM

How can I adapt theses codes if RP is f1 to f5 (or f1 to ...f1000)?

Indeed, I would like to compute automatically 5 (1000) different time series:

alphaM1f1 = f1 - (rm*reg coef of rm in f1)

alphaM1f2 = f2 - (rm*reg coef of rm in f2)

alphaM1f3 = f3 - (rm*reg coef of rm in f3)

alphaM1f4 = f4 - (rm*reg coef of rm in f4)

alphaM1f5 = f5 - (rm*reg coef of rm in f5)

...

alphaM1f1000 = f1000 - (rm*reg coef of rm in f1000)

pmorel3 · Posted 02-07-2020 11:46 AM

When I ran these codes, I noticed 2 weird things even though I don't have any error message:

1- "Value" I've specified in line 1687 is not considered in line 1717 anymore. Indeed, as I didn't have expected results in my output: line 1724 (I try to calculate AlphaM1 of f1 by excel and I did not have the right time serie), I tried to refind the value of "value" mentionned in my proc sql (line 1717) and each time, I found -1 as value of "Value" instead of f[i]. Can someone explain me why ?

2- "rm" I've kept in line 1711 is not the same I discovered in my proc print (line 1724). Can someone explain me why ?

1678  options notes;
1679  options mprint;
1680  /***M1 obtenir les coeff de M1 dans une seule base de données pour les 5 fonds***/
1681
1682  data work.model;
1683     set test;
1684     array f f1-f5;
1685     do i=1 to dim(f);
1686        label=vname(f[i]);
1687        value=f[i];
1688        output;
1689     end;
1690     keep rm zdy ztbl ztms zdfy rmrsq rm_zdy rm_ztbl rm_ztms rm_zdfy pred_mean rm_pred_mean label value;
1691  run;

NOTE: There were 360 observations read from the data set WORK.TEST.
NOTE: The data set WORK.MODEL has 1800 observations and 14 variables.
NOTE: DATA statement a utilisé (Durée totale du traitement) :
      real time           0.01 seconds
      cpu time            0.03 seconds


1692
1693  proc sort data=work.model;
1694     by label;
1695  run;

NOTE: There were 1800 observations read from the data set WORK.MODEL.
NOTE: The data set WORK.MODEL has 1800 observations and 14 variables.
NOTE: PROCEDURE SORT a utilisé (Durée totale du traitement) :
      real time           0.00 seconds
      cpu time            0.03 seconds


1696
1697  proc reg data=work.model
1698        outest=work.estM1
1699       ;
1700     by label;
1701     model value=rm;
1702  run;

NOTE: Interactivity disabled with BY processing.
NOTE: PROCEDURE REG a utilisé (Durée totale du traitement) :
      real time           3.08 seconds
      cpu time            1.45 seconds

NOTE: The data set WORK.ESTM1 has 5 observations and 8 variables.

1703  quit;
1704
1705    PROC PRINT DATA=work.estM1;
1706    RUN;

NOTE: There were 5 observations read from the data set WORK.ESTM1.
NOTE: PROCEDURE PRINT a utilisé (Durée totale du traitement) :
      real time           0.00 seconds
      cpu time            0.00 seconds


1707   /***M1: obtenir les 5 séries de alphaM1 dans une seule base de données***/
1708
1709  data work.score_x;
1710     set test;
1711     keep rm zdy ztbl ztms zdfy rmrsq rm_zdy rm_ztbl rm_ztms rm_zdfy pred_mean rm_pred_mean;
1712    RUN;

NOTE: There were 360 observations read from the data set WORK.TEST.
NOTE: The data set WORK.SCORE_X has 360 observations and 12 variables.
NOTE: DATA statement a utilisé (Durée totale du traitement) :
      real time           0.00 seconds
      cpu time            0.00 seconds


1713  proc sql;
1714     create table work.scoredM1 as
1715     select a.label, b.rm, a.rm as rm_parm,
1716            (b.rm*a.rm) as scndtrm,
1717            (value - (b.rm*a.rm)) as AlphaM1
1718     from work.estM1 as a, work.score_x as b
1719     order by a.label, b.rm
1720     ;
NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized.
NOTE: Table WORK.SCOREDM1 created, with 1800 rows and 5 columns.

1721  quit;
NOTE: PROCEDURE SQL a utilisé (Durée totale du traitement) :
      real time           0.01 seconds
      cpu time            0.01 seconds


1722   RUN;
1723
1724   proc print data=work.scoredM1;
1725   run;

NOTE: There were 1800 observations read from the data set WORK.SCOREDM1.
NOTE: PROCEDURE PRINT a utilisé (Durée totale du traitement) :
      real time           0.25 seconds
      cpu time            0.25 seconds


1726
1727   proc print data=work.score_x;
1728   run;

NOTE: There were 360 observations read from the data set WORK.SCORE_X.
NOTE: PROCEDURE PRINT a utilisé (Durée totale du traitement) :
      real time           0.10 seconds
      cpu time            0.09 seconds

pmorel3 · Posted 02-10-2020 09:53 AM

?

Register Today!

SAS Training: Just a Click Away