<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: OLS with no intercept by pairs of observations in SAS/IML Software and Matrix Computations</title>
    <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648828#M5112</link>
    <description>&lt;P&gt;Thank you Ian. No question that my code is inefficient, just simply have little knowledge for IML. Creating the matrix at it's final size would be great if I only knew how to make this happen. The data step code that I mentioned consists of PROC REG inside of a macro loop, nothing special. With help from a system administrator to increase MEMSIZE to 12 GB, the data step code executes fairly quickly. And I suspect that the IML code with a few tweaks would be comparable. I apologize if it seemed I was criticizing IML for the slow processing time. Thanks again for your time!&amp;nbsp; Rick&lt;/P&gt;</description>
    <pubDate>Tue, 19 May 2020 13:33:45 GMT</pubDate>
    <dc:creator>rfrancis</dc:creator>
    <dc:date>2020-05-19T13:33:45Z</dc:date>
    <item>
      <title>OLS with no intercept by pairs of observations</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648345#M5107</link>
      <description>&lt;P&gt;Hello, I am working to obtain OLS parameter estimates (with no intercept) for each possible pair of observations within subgroups. Let's say the year 1991 represents a group, then let's say there are two subgroups within 1991; the first subgroup contains 3 observations, which produces 3 possible pairs or combination (1-2, 2-3 and 1-3); and the second subgroup contains 4 observations, which produces 6 possible pairs or combinations (1-2, 1-3, 1-4, 2-3, 2-4 and 3-4). The goal is to generate OLS parameter estimates for a model such as Y = B1 + B2 + e (i.e., two indep vars and no intercept). If this sounds like the Theil-Sen method, you are spot on. However, note that this is the &lt;EM&gt;"multivariate version"&lt;/EM&gt;, which requires more than computing the slope for y = mx + b.&amp;nbsp; A fish much larger than I from the academic world has determined that OLS with no intercept and two observations will generate Theil-Sen slopes for B1 and B2 mentioned earlier (a modified r-square is necessary for OLS with no intercept, and will typically be between 99.5% and 100% when using the big fish methodology).. I have the "data step" SAS code from the big fish, but having witnessed the power and efficiency of IML, I set out to improve the lengthy processing time of the "data step" code. The IML code I developed below uses the OLS code from one of Rick Wicklin's blog, along with a loop developed by a SAS Community member.&amp;nbsp; It works at the group level, but I lack the IML knowledge to make it work at the subgroup level. I believe the key is to insert a DO loop within the existing DO loop to capture the dynamics of the subgroups. This will require reconstructing a new X matrix for each possible pair within a subgroup in order to compute the two slope values. If interested or helpful, I can supply the code from the big fish.&amp;nbsp; Thank you for any ideas or suggestions!&amp;nbsp; Rick&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data untrimmed;
     input group lag2cvrank lagcfo_ts cfo_ts lag2cfo_ts lag2acr_ts;
     cards;
     1991 0 155 175 165 35
     1991 0 200 225 250 75
     1991 0 75 125 135 65
     1991 1 350 375 400 55
     1991 1 155 175 165 85 
     1991 1 200 225 250 100
     1991 1 75 125 135 125
     ;
  run;

/* find unique BY-group combinations */
proc freq data=untrimmed;
tables group*lag2cvrank / out=FreqOut;
run;

proc iml;
start regress(XY);
        *c = allcomb(nrow(XY),2);       /* all "N choose 2" combinations of pairs */
		*c_rows=nrow(c);
		group = XY[1,4];        /* extract group from XY, to be used as a BY variable later */
        lag2cvrank = XY[1,5];   /* extract lag2cvrank from XY, to be used as a BY variable later */
		/* Extract x from XY */
		X = XY[c[i],{1 2}];       /* extract X from XY */ /* extract pairs, i.e, each combo in C */
		/* Extract y from XY */
		Y = XY[c[,],3];       /* extract Y from XY */
		xpx = x`*x;    /*cross-products*/
		xpy = x`*y;
		/*solve linear system*/
		/*Solution 1: compute inverse with INV (inefficient)*/
		xpxi = inv(xpx);    /*form inverse crossproducts*/
		b = xpxi*xpy;    /*solve for parameter estimates*/
		* Or a better solution ***;
		/*Solution 2: compute solution with SOLVE. More efficient*/
		b = (solve(xpx, xpy))`;    /* solve for parameter estimates*/
		t = nrow(XY);           /* number of rows in XY */
		group_col = group;
       	       	lag2cvrank_col = lag2cvrank;
		return (b || group_col || lag2cvrank_col);	
		end;	
  finish;

  /* read the BY groups */
  use FreqOut nobs NumGroups;
  read all var {group lag2cvrank};
  close FreqOut;

  use work.untrimmed;      
  create ts_fcst1_IML var {m m1 group_col lag2cvrank_col};
  setin work.untrimmed;    
  setout ts_fcst1_IML;

  inVarNames = {"lag2cfo_ts" "lag2acr_ts" "lagcfo_ts" "group" "lag2cvrank"};
  do i = 1 to NumGroups;                    /* for each BY group */
     read all var inVarNames into XY 
         where(group=(group[i]) &amp;amp; lag2cvrank=(lag2cvrank[i]));
		print xy;
	  /* X contains data for i_th group; analyze it */
	  c = allcomb(nrow(XY),2);       /* all "N choose 2" combinations of pairs */
		c_rows=nrow(c);	
	  do i = 1 to c_rows;  /* this is my feeble attempt at the new DO loop */
		G = regress(XY);
	 	/* extract the columns of the matrix */
     	        m=G[,1]; m1=G[,2]; group_col=G[,3]; lag2cvrank_col=G[,4];
                append;
	  end;
  end;

  close work.untrimmed;
  close ts_fcst1_IML;
quit;

data ts_fcst1_IML;
	set ts_fcst1_IML;
	proc print;run;
	
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Sat, 16 May 2020 23:37:47 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648345#M5107</guid>
      <dc:creator>rfrancis</dc:creator>
      <dc:date>2020-05-16T23:37:47Z</dc:date>
    </item>
    <item>
      <title>Re: OLS with no intercept by pairs of observations</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648395#M5108</link>
      <description>&lt;P&gt;Since it sounds like this is an academic research project, I encourage you to think about the statistical computation you want to make for each combination from the subgroups.&amp;nbsp;Suppose that cases 1 and 2 are being used for the first subgroup and cases&amp;nbsp;&lt;SPAN style="font-family: inherit;"&gt;3 and 4 are being used for the second subgroup. How do you estimate B1 and B2 for those observations?&amp;nbsp; Be sure you completely understand the answer and discuss it with your advisor if you have any questions.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;1. Write a function that performs that computation to make sure you really understand it. Make sure it also works when cases 1&amp;amp;2 are used for the first subgroup and cases 1&amp;amp;2 are used for the second subgroup.&lt;/P&gt;
&lt;P&gt;2.&amp;nbsp;Now think about how to handle all combinations from the subgroups. Use the ALLCOMB function on each subgroup to generate all pairwise combinations from subgroup 1 and from subgroup 2.&lt;/P&gt;
&lt;P&gt;3. Leverage the function you wrote in Step 1 to get all estimates over all pairwise combinations from the subgroups.&lt;/P&gt;
&lt;P&gt;4. Try to encapsulate steps 1-3 into a single function. The input to the function is a group. The output is the set of Sen-Theil estimates for the subgroups.&lt;/P&gt;
&lt;P&gt;5. After you have successfully completed steps 1-4, you can leverage your existing program to read each group, call the function in Step 4, and write the estimates to a SAS data set.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You are creating a big program, so you must break it into smaller pieces and then implement each small piece as efficiently as possible. In computer science lingo, this is sometimes called top-down design and bottom-up construction. Make sure you understand the underlying math/stats at each step before you begin to write any code.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Good luck!&lt;/P&gt;</description>
      <pubDate>Sun, 17 May 2020 12:55:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648395#M5108</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2020-05-17T12:55:28Z</dc:date>
    </item>
    <item>
      <title>Re: OLS with no intercept by pairs of observations</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648555#M5109</link>
      <description>&lt;P&gt;I developed the code below to solve this problem. The downside is that IML processing time is infinitely greater than than data-step processing time. Thank you SAS Community!&amp;nbsp; Rick&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data untrimmed;
     input group lag2cvrank lagcfo_ts cfo_ts lag2cfo_ts lag2acr_ts;
     cards;
     1991 0 155 175 165 35
     1991 0 200 225 250 75
	 1991 0 75 125 135 65
     1991 1 350 375 400 55
     1991 1 155 175 165 85 
     1991 1 200 225 250 100
     1991 1 75 125 135 125
	 1992 3 100 250 175 35 
	 1992 3 125 175 250 100
	 1992 3 200 225 300 175
     ;
  run;


/* find unique BY-group combinations */
proc freq data=untrimmed;
tables group*lag2cvrank / out=FreqOut;
run;

proc iml;

start regress(XY);
        *c = allcomb(nrow(XY),2);       /* all "N choose 2" combinations of pairs */
		*c_rows=nrow(c);
		c = allcomb(nrow(XY),2);       /* all "N choose 2" combinations of pairs */
	    c_rows=nrow(c);
		b_storage_first={0 0};
		do i = 1 to c_rows;
			XY_NEW=XY[c[i,],]; 
			group = XY[1,4];        /* extract group from XY, to be used as a BY variable later */
       		lag2cvrank = XY[1,5];   /* extract lag2cvrank from XY, to be used as a BY variable later */

			/* Extract x from XY */
			X = XY_NEW[,{1 2}];       /* extract X from XY */ /* extract pairs, i.e, each combo in C */
			
			/* Extract y from XY */
			Y = XY_NEW[,3];       /* extract Y from XY */
			xpx = x`*x;    /*cross-products*/
			xpy = x`*y;
			
			/*solve linear system*/
			/*Solution 1: compute inverse with INV (inefficient)*/
			xpxi = inv(xpx);    /*form inverse crossproducts*/
			b = xpxi*xpy;    /*solve for parameter estimates*/
			* Or a better solution ***;
			/*Solution 2: compute solution with SOLVE. More efficient*/
			b = (solve(xpx, xpy))`;    /* solve for parameter estimates*/
			t = nrow(XY_NEW);           /* number of rows in XY */
			group_col      = J(c_rows, 1, group);
			lag2cvrank_col = J(c_rows, 1, lag2cvrank);
			If i=1 then b_storage=b_storage_first+b; 
			else b_storage=b_storage//b;
			print b_storage;
			end;
			return (b_storage || group_col || lag2cvrank_col);	
finish;

  /* read the BY groups */
  use FreqOut nobs NumGroups;
  read all var {group lag2cvrank};
  close FreqOut;

  use work.untrimmed;      
  create ts_fcst1_IML var {group_col lag2cvrank_col med_b1 med_b2};*m m1;
  setin work.untrimmed;    
  setout ts_fcst1_IML;

  inVarNames = {"lag2cfo_ts" "lag2acr_ts" "lagcfo_ts" "group" "lag2cvrank"};
  do i = 1 to NumGroups;                    /* for each BY group */
     read all var inVarNames into XY 
        where(group=(group[i]) &amp;amp; lag2cvrank=(lag2cvrank[i]));
		/* X contains data for i_th group; analyze it */
	     G = regress(XY);
	 	/* extract the columns of the matrix */
		m=G[1,1]; m1=G[1,2]; group_col=G[1,3]; lag2cvrank_col=G[1,4]; /* Save matrix components into variables */
		/* Compute median slope values from all pairs for group/lag2cvrank combinations */
		med_b1=median(m); 
		med_b2=median(m1); 
		append;
	
  end;

  close work.untrimmed;
  close ts_fcst1_IML;
quit;

data ts_fcst1_IML;
	set ts_fcst1_IML;
	proc print;run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 18 May 2020 13:55:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648555#M5109</guid>
      <dc:creator>rfrancis</dc:creator>
      <dc:date>2020-05-18T13:55:08Z</dc:date>
    </item>
    <item>
      <title>Re: OLS with no intercept by pairs of observations</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648814#M5110</link>
      <description>&lt;P&gt;You say the IML solution is taking a lot longer, but there are probably ways in which it could be made more efficient. It is generally a bad idea to 'grow' a matrix inside a loop with syntax like:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;b_storage=b_storage//b;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;as this makes a new matrix for each iteration, and moves data around in memory unnecessarily.&amp;nbsp; It is better to create b_storage at it final size before the loop, and then write one row at a time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am guessing the data step solution you mention uses a short cut to compute regression coefficients directly, since there are only two points and no error.&amp;nbsp; You could do the same in IML to speed things up.&amp;nbsp; For an even faster solution, it could be vectorized to eliminate the loop over the combinations.&lt;/P&gt;</description>
      <pubDate>Tue, 19 May 2020 12:56:17 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648814#M5110</guid>
      <dc:creator>IanWakeling</dc:creator>
      <dc:date>2020-05-19T12:56:17Z</dc:date>
    </item>
    <item>
      <title>Re: OLS with no intercept by pairs of observations</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648815#M5111</link>
      <description>&lt;P&gt;To learn more about Ian's comment, see the article &lt;A href="https://blogs.sas.com/content/iml/2015/02/16/friends-dont-let-friends-concatenate-results-inside-a-loop.html" target="_self"&gt;"Friends don't let friends concatenate results inside a loop."&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'll also mention that you are solving the regression problem twice in every loop. You can delete the unnecessary statements for Solution 1.&lt;/P&gt;</description>
      <pubDate>Tue, 19 May 2020 13:04:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648815#M5111</guid>
      <dc:creator>Rick_SAS</dc:creator>
      <dc:date>2020-05-19T13:04:54Z</dc:date>
    </item>
    <item>
      <title>Re: OLS with no intercept by pairs of observations</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648828#M5112</link>
      <description>&lt;P&gt;Thank you Ian. No question that my code is inefficient, just simply have little knowledge for IML. Creating the matrix at it's final size would be great if I only knew how to make this happen. The data step code that I mentioned consists of PROC REG inside of a macro loop, nothing special. With help from a system administrator to increase MEMSIZE to 12 GB, the data step code executes fairly quickly. And I suspect that the IML code with a few tweaks would be comparable. I apologize if it seemed I was criticizing IML for the slow processing time. Thanks again for your time!&amp;nbsp; Rick&lt;/P&gt;</description>
      <pubDate>Tue, 19 May 2020 13:33:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648828#M5112</guid>
      <dc:creator>rfrancis</dc:creator>
      <dc:date>2020-05-19T13:33:45Z</dc:date>
    </item>
    <item>
      <title>Re: OLS with no intercept by pairs of observations</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648830#M5113</link>
      <description>&lt;P&gt;Thanks Rick!&amp;nbsp; Your link provides the how that I need. Solving the regression problem twice is definitely increasing the processing time.&amp;nbsp; Thank again! Rick&lt;/P&gt;</description>
      <pubDate>Tue, 19 May 2020 13:36:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648830#M5113</guid>
      <dc:creator>rfrancis</dc:creator>
      <dc:date>2020-05-19T13:36:34Z</dc:date>
    </item>
    <item>
      <title>Re: OLS with no intercept by pairs of observations</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648864#M5114</link>
      <description>&lt;P&gt;My suggestion is to declare the storage immediately before the loop like so:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;b_storage = j(c_rows, 2, .);&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;That is a matrix with as many rows as combinations, 2 columns and filled with missing values.&amp;nbsp; Once you have the matrix b in the loop, then copy it to the ith row of the storage matrix as follows:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;b_storage[i, ] = b;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The if/then/else statement can then be deleted, as can the definition of matrix b_storage_first.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Incidentally the statements that define group_col and lag2cvrank_col can be moved outside of the loop, they are being overwritten on every iteration of the loop - this will save a bit more time.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Finally, have you thought what happens if you have two data rows with identical pairs of x-values?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 19 May 2020 14:49:18 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648864#M5114</guid>
      <dc:creator>IanWakeling</dc:creator>
      <dc:date>2020-05-19T14:49:18Z</dc:date>
    </item>
    <item>
      <title>Re: OLS with no intercept by pairs of observations</title>
      <link>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648887#M5115</link>
      <description>&lt;P&gt;Thanks Ian!&amp;nbsp; I see how you are updating the matrix in the loop, makes a lotta sense. The nature of the data (10-K filings) makes duplicate values pretty remote. I have posed this very question to the big dog that developed the data step code, and he is unconcerned to say the least.&amp;nbsp; But your point is well-taken nonetheless. Thank you again for your time and willingness to help.&amp;nbsp; Rick&lt;/P&gt;</description>
      <pubDate>Tue, 19 May 2020 15:44:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-IML-Software-and-Matrix/OLS-with-no-intercept-by-pairs-of-observations/m-p/648887#M5115</guid>
      <dc:creator>rfrancis</dc:creator>
      <dc:date>2020-05-19T15:44:50Z</dc:date>
    </item>
  </channel>
</rss>

