<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sum of a variable in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899771#M355609</link>
    <description>&lt;P&gt;You&amp;nbsp; say this is a moderately large data set, 0.5 billion observations. What will you do with the data after you have created an even larger data set by adding a variable to the data set (you now have two data sets of 0.5 billion observations)?&lt;/P&gt;
&lt;P&gt;There must be some reason to add such a variable. Perhaps if you discuss it now we can save steps or make the code more efficient.&lt;/P&gt;</description>
    <pubDate>Tue, 24 Oct 2023 12:09:33 GMT</pubDate>
    <dc:creator>ballardw</dc:creator>
    <dc:date>2023-10-24T12:09:33Z</dc:date>
    <item>
      <title>Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899747#M355596</link>
      <description>&lt;P&gt;I want to create a new variable with the sum of a variable by the category of a second variable in a single step.&lt;/P&gt;
&lt;P&gt;The data i have. I want to create sum of var by id.&lt;/P&gt;
&lt;TABLE style="border-collapse: collapse; width: 96pt;" border="0" width="128" cellspacing="0" cellpadding="0"&gt;
&lt;TBODY&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD width="64" height="20" style="height: 15.0pt; width: 48pt;"&gt;id&lt;/TD&gt;
&lt;TD width="64" style="width: 48pt;"&gt;var&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;1&lt;/TD&gt;
&lt;TD align="right"&gt;5&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;1&lt;/TD&gt;
&lt;TD align="right"&gt;6&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;2&lt;/TD&gt;
&lt;TD align="right"&gt;2&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;2&lt;/TD&gt;
&lt;TD align="right"&gt;3&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;2&lt;/TD&gt;
&lt;TD align="right"&gt;9&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Data I want&lt;/P&gt;
&lt;TABLE style="border-collapse: collapse; width: 144pt;" border="0" width="192" cellspacing="0" cellpadding="0"&gt;
&lt;TBODY&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD width="64" height="20" style="height: 15.0pt; width: 48pt;"&gt;id&lt;/TD&gt;
&lt;TD width="64" style="width: 48pt;"&gt;var&lt;/TD&gt;
&lt;TD width="64" style="width: 48pt;"&gt;sum&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;1&lt;/TD&gt;
&lt;TD align="right"&gt;5&lt;/TD&gt;
&lt;TD align="right"&gt;11&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;1&lt;/TD&gt;
&lt;TD align="right"&gt;6&lt;/TD&gt;
&lt;TD align="right"&gt;11&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;2&lt;/TD&gt;
&lt;TD align="right"&gt;2&lt;/TD&gt;
&lt;TD align="right"&gt;14&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;2&lt;/TD&gt;
&lt;TD align="right"&gt;3&lt;/TD&gt;
&lt;TD align="right"&gt;14&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR style="height: 15.0pt;"&gt;
&lt;TD height="20" align="right" style="height: 15.0pt;"&gt;2&lt;/TD&gt;
&lt;TD align="right"&gt;9&lt;/TD&gt;
&lt;TD align="right"&gt;14&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;I know it can be done in 2 steps: in the first step we can create the sum variable and save in a new dataset and in second step we need to merge it with the original dataset.&lt;/P&gt;
&lt;P&gt;I want to avoid this 2 steps process as my dataset is huge (0.5 billion observations).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 10:24:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899747#M355596</guid>
      <dc:creator>BayzidurRahman</dc:creator>
      <dc:date>2023-10-24T10:24:08Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899748#M355597</link>
      <description>&lt;P&gt;Is your have dataset already sorted by id as your sample indicates?&lt;/P&gt;
&lt;P&gt;Any solution will require to read the data twice - but it should be possible to only write it once. The main performance impact will be the sorting so really important to know if the source is already sorted by id.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Also of some relevance: How many obs will belong to a single ID - median and max?&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 10:31:51 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899748#M355597</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2023-10-24T10:31:51Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899749#M355598</link>
      <description>&lt;P&gt;Try this&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
input id var;
datalines;
1 5 
1 6 
2 2 
2 3 
2 9 
;

proc sql;
   create table want as
   select id, var, sum(var) as sum
   from have
   group by id
   ;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 24 Oct 2023 10:28:54 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899749#M355598</guid>
      <dc:creator>PeterClemmensen</dc:creator>
      <dc:date>2023-10-24T10:28:54Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899752#M355600</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could try the following:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  input id var;
  
  datalines;
1 5
1 6
2 2
2 3
2 9
;


data want;
  do until(last.id);
    set have;
    by id;
    
    sum = sum(0, sum, var);
  end;

  do until(last.id);
    set have;
    by id;
    
    output;
  end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; kind regards,&lt;/P&gt;
&lt;P&gt;Amir.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 10:32:28 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899752#M355600</guid>
      <dc:creator>Amir</dc:creator>
      <dc:date>2023-10-24T10:32:28Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899753#M355601</link>
      <description>&lt;P&gt;Sorting would not be a big deal but would be preferable avoid. Can we not use the notsorted option?&lt;/P&gt;
&lt;P&gt;there will be few 100s observations per unique ID.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 10:33:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899753#M355601</guid>
      <dc:creator>BayzidurRahman</dc:creator>
      <dc:date>2023-10-24T10:33:27Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899754#M355602</link>
      <description>&lt;P&gt;Thanks Amir.&lt;BR /&gt;how to modify the code if I want to do the sum by two id variables, say id1 and id2?&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 10:35:59 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899754#M355602</guid>
      <dc:creator>BayzidurRahman</dc:creator>
      <dc:date>2023-10-24T10:35:59Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899755#M355603</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/437116"&gt;@BayzidurRahman&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;Sorting would not be a big deal but would be preferable avoid. Can we not use the notsorted option?&lt;/P&gt;
&lt;P&gt;there will be few 100s observations per unique ID.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;You need a sum by ID so sorting is required. Sorting will cause a major performance impact.&lt;/P&gt;
&lt;P&gt;If your have table needs to maintain all the source rows then a SQL with group by is likely not the right approach as it will "collapse" rows - and it will also do an implicit sort.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;To avoid sorting one could first create the aggregated data in a hash table and then lookup the aggregates from there. If you've got around 100 rows per group then the data should fit into a hash unless your id variable(s) are very long.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Update: Below how this could work using a hash table. If you have more than one key variable just add them to the hash - defineKey('id1','id2'). Things will work as long as the summary hash table can fit into memory (5M rows according to the info you shared should fit). IF the key variables are very long then one could also create a hash key over them and use this as key. If using md5 and with 5 million rows the hash table should consume less than 250MB (32 bytes for the hash key plus 8 bytes for the sum_var * 5M groups plus the "overhead").&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have;
  input id var;
  
  datalines;
1 5
1 6
2 2
2 3
2 9
;

data want(drop=_:);

  if _n_=1 then 
    do;
      if 0 then set have;
      length sum_var 8;
      dcl hash h1();
      h1.defineKey('id');
      h1.defineData('sum_var');
      h1.defineDone();
      do _i=1 to _nobs;
        set have point=_i nobs=_nobs;
        if h1.find() ne 0 then sum_var=var;
        else sum_var=sum(sum_var,var);
        _rc=h1.replace();
      end;
    end;

  set have;
  _rc=h1.find();

run;

proc print data=want;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Below how an approach using a hash key could look like. I'd only recommend taking this approach if the combined length of the key variables is significantly above 32 characters because creating the hash key is resource intensive.&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want(drop=_:);

  if _n_=1 then 
    do;
      if 0 then set have;
      length sum_var 8 _hash_key $32;
      dcl hash h1();
      h1.defineKey('_hash_key');
      h1.defineData('sum_var');
      h1.defineDone();
      do _i=1 to _nobs;
        set have point=_i nobs=_nobs;
        _hash_key=hashing('md5',catx('|',id1,id2));
        if h1.find() ne 0 then sum_var=var;
        else sum_var=sum(sum_var,var);
        _rc=h1.replace();
      end;
    end;

  set have;
  _hash_key=hashing('md5',catx('|',id1,id2));
  _rc=h1.find();

run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 24 Oct 2023 11:27:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899755#M355603</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2023-10-24T11:27:10Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899756#M355604</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I would suggest replacing &lt;FONT face="courier new,courier"&gt;last.id&lt;/FONT&gt; with &lt;FONT face="courier new,courier"&gt;last.id2&lt;/FONT&gt; and &lt;FONT face="courier new,courier"&gt;by id&lt;/FONT&gt; with &lt;FONT face="courier new,courier"&gt;by id2&lt;/FONT&gt;, as follows:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data want;
  do until(last.id2);
    set have;
    by id2;
    
    sum = sum(0, sum, var);
  end;

  do until(last.id2);
    set have;
    by id2;
    
    output;
  end;
run;
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;However, without knowing what the relationship is between id1 &amp;amp; id2 and not knowing what result you want accordingly it can be difficult to say. I would advise you supply the data you have and the data you want in the form of data steps as some of us have done to create your input data, then we can try to advise further, as required.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Thanks &amp;amp; kind regards,&lt;/P&gt;
&lt;P&gt;Amir.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 10:46:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899756#M355604</guid>
      <dc:creator>Amir</dc:creator>
      <dc:date>2023-10-24T10:46:03Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899771#M355609</link>
      <description>&lt;P&gt;You&amp;nbsp; say this is a moderately large data set, 0.5 billion observations. What will you do with the data after you have created an even larger data set by adding a variable to the data set (you now have two data sets of 0.5 billion observations)?&lt;/P&gt;
&lt;P&gt;There must be some reason to add such a variable. Perhaps if you discuss it now we can save steps or make the code more efficient.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 12:09:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899771#M355609</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2023-10-24T12:09:33Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899774#M355610</link>
      <description>&lt;P&gt;I want to run regression model on the newly generated variable. I will create multiple variables with different resolutions (eg., weekly total by participants, monthly total by participants etc ) and run separate models on them. I have daily data per participant.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 12:27:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899774#M355610</guid>
      <dc:creator>BayzidR</dc:creator>
      <dc:date>2023-10-24T12:27:57Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899775#M355611</link>
      <description>&lt;P&gt;The id1 and id2 are independent. Their combination creates unique groups.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 12:30:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899775#M355611</guid>
      <dc:creator>BayzidR</dc:creator>
      <dc:date>2023-10-24T12:30:34Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899777#M355612</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/442251"&gt;@BayzidR&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I want to run regression model on the newly generated variable. I will create multiple variables with different resolutions (eg., weekly total by participants, monthly total by participants &lt;EM&gt;etc&lt;/EM&gt; ) and run separate models on them. I have daily data per participant.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;So, if I am understanding you properly, you don't need to find these sums on the 0.5 billion records once, you will need to do this many times, once for each regression you plan to run.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Do you need an output data set with the original 0.5 billion records, or just one record per ID?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;May I suggest an alternative approach? Regression on 0.5 billion records seems like overkill, you could randomly sample the data such that you now have (let's say) 50,000 IDs (and of course much fewer than 0.5 billion records), reducing the need for this computationally intensive finding of sums, without really losing much quality in the regression.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;But even here I am skeptical. Suppose you do all this different aggregating of data to get all these different sums, and then you fit many different regressions. Then what? What will you do after you have many different regressions, based upon many different time intervals? How will these regressions be used?&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 12:46:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899777#M355612</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2023-10-24T12:46:45Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899779#M355613</link>
      <description>&lt;P&gt;I don't need to run regression on 0.5b observations but only on the aggregate datasets. the 0.5b observations will be collapsed into 500 clusters and 60 to 120 time points (monthly or fortnightly).&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 12:54:34 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899779#M355613</guid>
      <dc:creator>BayzidR</dc:creator>
      <dc:date>2023-10-24T12:54:34Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899780#M355614</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/442251"&gt;@BayzidR&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I don't need to run regression on 0.5b observations but only on the aggregate datasets. the 0.5b observations will be collapsed into 500 clusters and 60 to 120 time points (monthly or fortnightly).&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;So, if I understand you properly, you don't need the data set you posted originally, you need just one record for each ID containing the sum. Is that correct?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I am thinking that collapsing 0.5 billion records into 500 clusters is also an extremely computationally intensive task, something that makes me shudder and it sounds to me that getting eaten by bears would be more pleasant (and faster). Anyway, again I think randomly sampling the data first would be a much better approach.&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 12:59:21 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899780#M355614</guid>
      <dc:creator>PaigeMiller</dc:creator>
      <dc:date>2023-10-24T12:59:21Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899797#M355623</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/442251"&gt;@BayzidR&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I want to run regression model on the newly generated variable. I will create multiple variables with different resolutions (eg., weekly total by participants, monthly total by participants etc ) and run separate models on them. I have daily data per participant.&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;But you did not request an "aggregate data set". You requested add a value to all records of an existing data set.&lt;/P&gt;
&lt;P&gt;IF you have a variable to indicate the DATE (if you don't there is something wrong) of the daily data then you can create summaries by calendar month by using the proper format in Proc Summary to aggregate the data.&lt;/P&gt;
&lt;P&gt;A large data set should be sorted by the "grouping" variables to use the following code.&lt;/P&gt;
&lt;P&gt;Example:&lt;/P&gt;
&lt;PRE&gt;data example;
   do id=1,2,3;
       do date= '01Jan1999'd to '23OCT2023'd;
         somevalue = rand('integer',1000);
         output;
       end;
   end;
   format date date9.;
run;

/* proc sort by id date to make sure your data is in order*/
 
proc summary data=example;
   by id date;
   format date Monyy7.;
   var somevalue ;
   output out=monthsummary (drop=_type_) sum=;
run;&lt;/PRE&gt;
&lt;P&gt;The output data set will contain a variable _freq_ with the count of observations used. If you feel a need can rename the sum variable whatever you want, the above uses the same variable name as the Var. Which may be useful so you don't have to change names of variables in the various regressions just use the correct input set.&lt;/P&gt;
&lt;P&gt;to create calendar weekly one way:&lt;/P&gt;
&lt;PRE&gt; 
proc summary data=example;
   by id date;
   format date weeku6.;
   var somevalue ;
   output out=weeksummary (drop=_type_) sum=;
run;&lt;/PRE&gt;
&lt;P&gt;Week appearance with the above will be a 2-digit year followed by W then the number of the week in the year. There are 3 related formats, WeekU WeekV and WeekW that have differences on when a "week" starts and how the transition across years works.&lt;/P&gt;
&lt;P&gt;If you Print the data with a DATE9 or similar format assigned the DATE in the result you should get the first day of the "week" as the value.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Formats can create groups of any variable and will be honored by procedures like Summary, Freq and most analysis, reporting or graphing procedures (caution with custom date formats and graphing may not always be as expected).&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 14:43:24 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899797#M355623</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2023-10-24T14:43:24Z</dc:date>
    </item>
    <item>
      <title>Re: Sum of a variable</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899805#M355628</link>
      <description>&lt;P&gt;If you have time data I would recommend using PROC TIMESERIES to accumulate your data. It can take into account time periods better than proc summary or proc means.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;A href="https://documentation.sas.com/doc/en/etscdc/14.2/etsug/etsug_timeseries_examples01.htm" target="_blank"&gt;https://documentation.sas.com/doc/en/etscdc/14.2/etsug/etsug_timeseries_examples01.htm&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/442251"&gt;@BayzidR&lt;/a&gt;&amp;nbsp;wrote:&lt;BR /&gt;
&lt;P&gt;I don't need to run regression on 0.5b observations but only on the aggregate datasets. the 0.5b observations will be collapsed into 500 clusters and 60 to 120 time points (monthly or fortnightly).&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 24 Oct 2023 15:04:45 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-of-a-variable/m-p/899805#M355628</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2023-10-24T15:04:45Z</dc:date>
    </item>
  </channel>
</rss>

