<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Sum down a column using information from 2 unmerged datasets in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398951#M96570</link>
    <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/112197"&gt;@pamplemouse22&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;What does "very large" mean? 1, 10, 100, ... millions of rows?&lt;/P&gt;</description>
    <pubDate>Tue, 26 Sep 2017 21:19:08 GMT</pubDate>
    <dc:creator>Patrick</dc:creator>
    <dc:date>2017-09-26T21:19:08Z</dc:date>
    <item>
      <title>Sum down a column using information from 2 unmerged datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398228#M96298</link>
      <description>&lt;P&gt;Hi all,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Data management question. I am trying to create a dataset using information from two datasets (not merged) and would like some suggestions how to complete this task. I'd like to sum down a column, by group, from dataset "have_a", using&amp;nbsp;a condition that uses data in dataset "have_b". Is this possible without merging the two datasets?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to generate dataset "want" by computing the average count from dataset "have_a" over the date range specified in dataset "have_b".&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The following describes what I have and want. I hope this is clear. Thank you!!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;	data have_a;
	input group count date; 
	datalines; 
	1 50 01JAN2005
	1 51 02JAN2005
	1 49 03JAN2005
	1 34 04JAN2005
	1 34 05JAN2005
	;
	run;

	data have_b; 
	input group date_start date_end; 
	datalines; 
	1 01JAN2005 05JAN2005
	1 02JAN2005 05JAN2005
	; 
	run;

	data want; 
	input group date_start date_end mean; 
	datalines; 
	1 01JAN2005 05JAN2005 43.6
	1 02JAN2005 05JAN2005 42.0
	;
	run;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 22 Sep 2017 21:38:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398228#M96298</guid>
      <dc:creator>pamplemouse22</dc:creator>
      <dc:date>2017-09-22T21:38:42Z</dc:date>
    </item>
    <item>
      <title>Re: Sum down a column using information from 2 unmerged datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398242#M96305</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/112197"&gt;@pamplemouse22&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;You will have to combine the data somehow in order to make the grouping information from your 2nd dataset available to your first dataset.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;You could create a format from your 2nd dataset (using Proc Format / CNTLIN) and then directly use this format in one of the SAS procs for calculation of aggregated results.&lt;/P&gt;</description>
      <pubDate>Fri, 22 Sep 2017 22:48:02 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398242#M96305</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2017-09-22T22:48:02Z</dc:date>
    </item>
    <item>
      <title>Re: Sum down a column using information from 2 unmerged datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398243#M96306</link>
      <description>&lt;P&gt;Not the most efficient:&lt;/P&gt;
&lt;PRE&gt;proc sql;
   create table want as
   select distinct have_b.*, mean(have_a.count) as mean
   from have_b, have_a
   where  have_b.date_start le have_a.date le have_b.date_end
   group by have_b.date_start,  have_b.date_end
   ;
quit;

&lt;/PRE&gt;
&lt;P&gt;Note that this requires all of the date variables to be SAS date valued variables.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Sep 2017 15:02:56 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398243#M96306</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-09-26T15:02:56Z</dc:date>
    </item>
    <item>
      <title>Re: Sum down a column using information from 2 unmerged datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398250#M96309</link>
      <description>&lt;P&gt;Hi Patrick,&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks! The only way I could think of merging the data is to create a binary variable "start"-1/0 and then have a single date variable and then I can merge by date, but this would involve making my dataset larger than it is - not sure I want to do that. the proc sql option below seems like a good option ! Am going to try that. Thanks for your help.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 23 Sep 2017 01:32:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398250#M96309</guid>
      <dc:creator>pamplemouse22</dc:creator>
      <dc:date>2017-09-23T01:32:27Z</dc:date>
    </item>
    <item>
      <title>Re: Sum down a column using information from 2 unmerged datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398251#M96310</link>
      <description>&lt;P&gt;Thank you so much! This is exactly what I was looking for. I have a few more conditions that I didn't describe that are needed to make the merge, so I will attempt adjusting your example code, accordingly. Thanks for getting it started! I appreciate it. Might be back with additional questions.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 23 Sep 2017 01:33:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398251#M96310</guid>
      <dc:creator>pamplemouse22</dc:creator>
      <dc:date>2017-09-23T01:33:53Z</dc:date>
    </item>
    <item>
      <title>Re: Sum down a column using information from 2 unmerged datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398619#M96445</link>
      <description>&lt;P&gt;Hi ballardw,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Can you please explain to me how this part of your code pulls the right data for the calculation? Thank you!&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt; where  have_b.date_start le have_a.date le have_b.date_end
   group by have_b.date_start,  have_b.date_end&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 25 Sep 2017 17:15:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398619#M96445</guid>
      <dc:creator>pamplemouse22</dc:creator>
      <dc:date>2017-09-25T17:15:25Z</dc:date>
    </item>
    <item>
      <title>Re: Sum down a column using information from 2 unmerged datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398834#M96504</link>
      <description>&lt;BLOCKQUOTE&gt;&lt;HR /&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/112197"&gt;@pamplemouse22&lt;/a&gt; wrote:&lt;BR /&gt;
&lt;P&gt;Hi ballardw,&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can you please explain to me how this part of your code pulls the right data for the calculation? Thank you!&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt; where  have_b.date_start le have_a.date le have_b.date_end
   group by have_b.date_start,  have_b.date_end&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;HR /&gt;&lt;/BLOCKQUOTE&gt;
&lt;P&gt;The where clause restricts the data to where the "have_a" data set date is between the the start and end dates in the "have_b" data set allowing the endpoints to match.&lt;/P&gt;
&lt;P&gt;The group by takes the results creating a "group" for the combinations of date_start and date_end for the summarization function&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The reason I say this isn't the most effecient (in potential run time) is that this code combines every record in Have_b with every record in Have_a, tests to see if the Have_a data is the desired interval (if you have 20 records in Have_b and 100 in Have_a that is 20*100 comparisons which can get large quickly =&amp;gt; slow run time), groups by the desired interval and then summarizes calculating the mean.&lt;/P&gt;</description>
      <pubDate>Tue, 26 Sep 2017 15:02:37 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398834#M96504</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2017-09-26T15:02:37Z</dc:date>
    </item>
    <item>
      <title>Re: Sum down a column using information from 2 unmerged datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398844#M96506</link>
      <description>&lt;P&gt;That is very clear, thank you. I am testing this out with a small sample of my data (and it works perfectly), but, the real dataset is very large.&amp;nbsp;I will allow ample time for running.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Desired intervals at the moment vary because of missing data, but with some more steps I can create a standard interval (where some of the date values are missing instead of being skipped).&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If there are quick adjustments to make this process more efficient, suggestions welcome.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you so much! Appreciate it.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 26 Sep 2017 15:39:40 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398844#M96506</guid>
      <dc:creator>pamplemouse22</dc:creator>
      <dc:date>2017-09-26T15:39:40Z</dc:date>
    </item>
    <item>
      <title>Re: Sum down a column using information from 2 unmerged datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398951#M96570</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/112197"&gt;@pamplemouse22&lt;/a&gt;&lt;/P&gt;
&lt;P&gt;What does "very large" mean? 1, 10, 100, ... millions of rows?&lt;/P&gt;</description>
      <pubDate>Tue, 26 Sep 2017 21:19:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Sum-down-a-column-using-information-from-2-unmerged-datasets/m-p/398951#M96570</guid>
      <dc:creator>Patrick</dc:creator>
      <dc:date>2017-09-26T21:19:08Z</dc:date>
    </item>
  </channel>
</rss>

