<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sincere request for efficient code for inner products in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427533#M105466</link>
    <description>&lt;P&gt;Here's a data step that likely takes a lot less memory than your prod sql.&amp;nbsp; I don't have sas on this machine, so the code is untested:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Edit Note: I moved the "inner_prod=0;" statement inside the do loop, where it belongs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want (keep=gvkey year gvkey2 year2  inner_prod);

  array _years {200000}     _temporary_;
  array _gvkeys {200000} $10 _temporary_;
  array _s {1000} s1-s1000;

  array matrix {200000,1000} _temporary_;

  SET HAVE;
  _years{_n_}=year;
  _gvkey{_n_}=gvkey;
  do i=1 to 1000;
    matrix{_n_,i}=_s{i};
  end;

  if _n_&amp;gt;1 then do n2=1 to _n_-1;
    gvkey2=_gvkeys{n2};
    year2=_years{n2};&lt;BR /&gt;    inner_prod=0;
    do i=1 to 1000;
      inner_prod + s{i}*matrix{n2,i};
    end;
    output;
  end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The main memory consumer is the 2-dimensional temporary array named MATRIX, which requires 200,000*1,000 cells (=200m cells).&amp;nbsp; At 8 bytes per cell, that's 1.6GB of ram.&amp;nbsp; Plus the other memory requirements that's less than 2B.&amp;nbsp; Can you get that?&amp;nbsp; If so, then try the program.&lt;/P&gt;
&lt;P&gt;Notice this program doesn't require renaming variables.&amp;nbsp; Instead it stored vars S1-S1000 in MATRIX for retrieving when generating inner product with subsequent observations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You mention firm-years and 1,000 variables, so it looks like you might be using Compustat data, so I included the compustat firm id variables GVKEY.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 15 Jan 2018 05:42:16 GMT</pubDate>
    <dc:creator>mkeintz</dc:creator>
    <dc:date>2018-01-15T05:42:16Z</dc:date>
    <item>
      <title>Sincere request for efficient code for inner products</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427518#M105455</link>
      <description>&lt;P&gt;I have a data of 200,000 firm-year observations with s1-s1000,1000 numeric variables (or, you can perceive them as 1-by-1000 vector for each firm-year observation). I would like to calculate the sum of each firm-year observation's 1-by-1000 vector's inner product with all other firm-year observations' 1&lt;SPAN&gt;-by-1000 vectors matching on the same year.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Since I find it is difficult to directly perform vector calculations in SAS for my case, I perform the simple multiplications by the following sas code:&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;%macro process;&lt;/P&gt;&lt;P&gt;%do i=1980 %to 2017;&lt;/P&gt;&lt;P&gt;* I intend to avoid over-capacity of SAS by merging observations within each year;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;data t1;set x6;if year=&amp;amp;i;run;&lt;BR /&gt;data t2;set x6_copy;if year=&amp;amp;i;run;&lt;/P&gt;&lt;P&gt;* x6_copy.sas is the copy of x6.sas, except that I rename s1-s1000 to be ss1-ss1000 for merging;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sql;create table t3 as select a.*, b.* from t1 a left join t2 b on a.firm^=b.firm;quit;&lt;/P&gt;&lt;P&gt;* merge each firm-year with all other firm-years to add ss1-ss1000;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;data t3;set t3;array x {1000} s1-s1000;array y {1000} ss1-ss1000;&lt;BR /&gt;d=0;do i=1 to 1000;d=d+x{i}*y{i};end;drop i;run;&lt;/P&gt;&lt;P&gt;* mechanical calculation of inner product;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;proc sort data=t3;by&amp;nbsp;firm&amp;nbsp;year;run;&lt;BR /&gt;proc means data=t3 noprint;var d;by &lt;SPAN&gt;firm&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;year&lt;/SPAN&gt;;output out=folder.x&amp;amp;i sum=d;run;&lt;/P&gt;&lt;P&gt;* sum up across each firm-year;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;proc datasets lib=work noprint;delete t1 t2 t3;quit;&lt;/P&gt;&lt;P&gt;%end;&lt;BR /&gt;%mend;&lt;BR /&gt;%process();&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;In practice, to avoid "insufficient space" problem, I even further divide 1000 variables into "300, 300, 300, 100" 4 groups to reduce the one-time workload of SAS. But it still takes my three computers at least two days to run the code. So I really appreciate it if anyone could let me know how I can improve my code to save some time. Thank you very much in advance!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Jan 2018 15:52:30 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427518#M105455</guid>
      <dc:creator>dayuan</dc:creator>
      <dc:date>2018-01-14T15:52:30Z</dc:date>
    </item>
    <item>
      <title>Re: Sincere request for efficient code for inner products</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427528#M105463</link>
      <description>&lt;P&gt;This is likely to work faster, with one computer and no splitting up of the data.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;First, build your own index that shows where each year begins and ends.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;proc sort data=x6;&lt;/P&gt;
&lt;P&gt;by year firm;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;data my_index;&lt;/P&gt;
&lt;P&gt;set x6 (keep=year);&lt;/P&gt;
&lt;P&gt;by year;&lt;/P&gt;
&lt;P&gt;if first.year then do;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;start = _n_;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;end; = _n_;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;fmtname = 'first_yr';&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;output;&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;if last.year then do;;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;start = _n_;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;end = _n_;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;fmtname = 'last_yr';&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;output;&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;proc sort data=my_index;&lt;/P&gt;
&lt;P&gt;by fmtname;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;proc format cntlin=my_index;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Inspect the data set MY_INDEX, to get a feel for what you have created so far.&amp;nbsp; The two formats created (FIRST_YR and LAST_YR) will translate the year into the first and last observation number from X6 that belong to a given year.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Then use the formats to read in the required observations:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;data want;&lt;/P&gt;
&lt;P&gt;set x6;&lt;/P&gt;
&lt;P&gt;array s {1000};&lt;/P&gt;
&lt;P&gt;array t {1000} _temporary_;&lt;/P&gt;
&lt;P&gt;do _n_1 to 1000;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;t{_n_} = s{_n_};&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;first_match = input(put(year, first_yr.), 4.);&lt;/P&gt;
&lt;P&gt;last_match = input(put(year, last_yr), 4.);&lt;/P&gt;
&lt;P&gt;d = 0;&lt;/P&gt;
&lt;P&gt;do _n_=first_match to last_match;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;set x6 (rename=(firm=comparison_firm)) point=_n_;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;if firm ne comparison_firm then do j=1 to 1000;&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; d + s{j} * t{j};&lt;/P&gt;
&lt;P&gt;&amp;nbsp; &amp;nbsp;end;&lt;/P&gt;
&lt;P&gt;end;&lt;/P&gt;
&lt;P&gt;keep firm year d;&lt;/P&gt;
&lt;P&gt;run;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;It's untested code, so might need some tweaking.&amp;nbsp; But it should be in the ballpark.&amp;nbsp; This would give you 200,000 observations, with the summed variable (D) on each.&amp;nbsp; As always, test first on a smaller data set (all years, but for just 3 firms, for example).&lt;/P&gt;</description>
      <pubDate>Sun, 14 Jan 2018 17:49:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427528#M105463</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2018-01-14T17:49:52Z</dc:date>
    </item>
    <item>
      <title>Re: Sincere request for efficient code for inner products</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427529#M105464</link>
      <description>&lt;P&gt;Why not just transpose the data? Wouldn't that be easier? So if you had data like:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have ;
  input firm $ year var $ value ;
cards;
one 2017 x1 1
one 2017 x2 2
two 2017 x1 3
two 2017 x2 4 
;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;And you want to generate a new variable D that for year=2017 and firms 'one' and 'two' would be the sum 1*3 + 2*4.&lt;/P&gt;
&lt;P&gt;So that is simple to do with this structure.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;proc sql ;
  create table D as
    select a.firm, a.year, b.firm as with_firm
         , 'D' as var length=8
         , sum(a.value * b.value) as value
    from have a
    inner join have b
      on a.firm ne b.firm
      and a.year = b.year
      and a.var = b.var
    group by 1,2,3,4
  ;
quit;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;PRE&gt;                       with_
Obs    firm    year    firm     var    value

 1     one     2017     two      D       11
 2     two     2017     one      D       11&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 14 Jan 2018 18:14:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427529#M105464</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2018-01-14T18:14:52Z</dc:date>
    </item>
    <item>
      <title>Re: Sincere request for efficient code for inner products</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427533#M105466</link>
      <description>&lt;P&gt;Here's a data step that likely takes a lot less memory than your prod sql.&amp;nbsp; I don't have sas on this machine, so the code is untested:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Edit Note: I moved the "inner_prod=0;" statement inside the do loop, where it belongs.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want (keep=gvkey year gvkey2 year2  inner_prod);

  array _years {200000}     _temporary_;
  array _gvkeys {200000} $10 _temporary_;
  array _s {1000} s1-s1000;

  array matrix {200000,1000} _temporary_;

  SET HAVE;
  _years{_n_}=year;
  _gvkey{_n_}=gvkey;
  do i=1 to 1000;
    matrix{_n_,i}=_s{i};
  end;

  if _n_&amp;gt;1 then do n2=1 to _n_-1;
    gvkey2=_gvkeys{n2};
    year2=_years{n2};&lt;BR /&gt;    inner_prod=0;
    do i=1 to 1000;
      inner_prod + s{i}*matrix{n2,i};
    end;
    output;
  end;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;The main memory consumer is the 2-dimensional temporary array named MATRIX, which requires 200,000*1,000 cells (=200m cells).&amp;nbsp; At 8 bytes per cell, that's 1.6GB of ram.&amp;nbsp; Plus the other memory requirements that's less than 2B.&amp;nbsp; Can you get that?&amp;nbsp; If so, then try the program.&lt;/P&gt;
&lt;P&gt;Notice this program doesn't require renaming variables.&amp;nbsp; Instead it stored vars S1-S1000 in MATRIX for retrieving when generating inner product with subsequent observations.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You mention firm-years and 1,000 variables, so it looks like you might be using Compustat data, so I included the compustat firm id variables GVKEY.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jan 2018 05:42:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427533#M105466</guid>
      <dc:creator>mkeintz</dc:creator>
      <dc:date>2018-01-15T05:42:16Z</dc:date>
    </item>
    <item>
      <title>Re: Sincere request for efficient code for inner products</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427594#M105501</link>
      <description>&lt;P&gt;Hi, thank you so much for your great help! I tried your advanced code. It seemed that the only the problem is "ERROR: Missing LABEL variable." received when I run "proc format cntlin=my_index;run;" so that the "first_yr" and "last_yr" format could not be found or uploaded...While I am looking for the solution, if it comes to your mind quickly, may I know how can I overcome the error related to "proc format cntlin"?&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jan 2018 03:43:41 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427594#M105501</guid>
      <dc:creator>dayuan</dc:creator>
      <dc:date>2018-01-15T03:43:41Z</dc:date>
    </item>
    <item>
      <title>Re: Sincere request for efficient code for inner products</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427596#M105502</link>
      <description>&lt;P&gt;You could also use your existing data structure and just generate a really long SUM() function call to add the 1,000 products together. That will at least eliminate one loop.&amp;nbsp; Also no need to generate both A with B and B with A.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;create table want as 
  select a.firm, a.year, b.firm as with_firm 
       , sum(a.s1*b.s1,a.s2*b.s2
  ...
           ,a.s1000*b.s1000) as D
  from have a 
  inner join have b
  on a.year = b.year 
  and a.firm &amp;lt; b.firm
;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Mon, 15 Jan 2018 03:51:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427596#M105502</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2018-01-15T03:51:55Z</dc:date>
    </item>
    <item>
      <title>Re: Sincere request for efficient code for inner products</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427837#M105559</link>
      <description>&lt;P&gt;Really appreciate your help! I made things complicated at the very beginning...&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jan 2018 21:03:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sincere-request-for-efficient-code-for-inner-products/m-p/427837#M105559</guid>
      <dc:creator>dayuan</dc:creator>
      <dc:date>2018-01-15T21:03:23Z</dc:date>
    </item>
  </channel>
</rss>

