<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Combine large datasets in SAS Procedures</title>
    <link>https://communities.sas.com/t5/SAS-Procedures/Combine-large-datasets/m-p/28131#M6536</link>
    <description>I find it difficult to think that you are going to get better performance in SAS than running your join in a Terradata pass thru.  If Terradata is running much faster with a monthly query, try using your macro to generate the Teradata with a series of UNION ALL joins, then bring the resultant table back to SAS.&lt;BR /&gt;
I personnally find the optimizer in Terradata less effective than that of Oracle (IMHO) so performance is very dependant on your query matching the stucture of the indecies.</description>
    <pubDate>Tue, 01 Jun 2010 12:53:22 GMT</pubDate>
    <dc:creator>Flip</dc:creator>
    <dc:date>2010-06-01T12:53:22Z</dc:date>
    <item>
      <title>Combine large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Combine-large-datasets/m-p/28128#M6533</link>
      <description>Hi,&lt;BR /&gt;
&lt;BR /&gt;
I am very new to SAS.&lt;BR /&gt;
&lt;BR /&gt;
I have just tried to write a macro for pulling 2 years transaction data hitting the teradata DB. I had pulled monthly basis just for better performance  on DB side.&lt;BR /&gt;
&lt;BR /&gt;
I have combined all the datasets into a single at the end using proc append.&lt;BR /&gt;
&lt;BR /&gt;
Following is the code:&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
%macro pull (bdate,edate,mthrange);&lt;BR /&gt;
&lt;BR /&gt;
proc sql;&lt;BR /&gt;
connect to oracle (user=zzzzzz pw=yyyyyy server=xxx);&lt;BR /&gt;
create table in.data_&amp;amp;mthrange. as&lt;BR /&gt;
select * from connection to teradata;&lt;BR /&gt;
(&lt;BR /&gt;
select * from trans_table&lt;BR /&gt;
where trans_dt between &amp;amp;bdate and &amp;amp;edate&lt;BR /&gt;
);&lt;BR /&gt;
disconnect from teradata;&lt;BR /&gt;
quit;&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
%mend pull;&lt;BR /&gt;
%pull('2008-03-02','2008-05-31',Mar08_May08);&lt;BR /&gt;
%pull('2008-06-01','2008-08-30',Jun08_Aug08);&lt;BR /&gt;
%pull('2008-08-31','2008-10-04',Sep08);&lt;BR /&gt;
%pull('2008-10-05','2008-11-04',Oct08);&lt;BR /&gt;
%pull('2008-11-02','2008-11-29',Nov08);&lt;BR /&gt;
%pull('2008-11-30','2009-01-03',Dec08);&lt;BR /&gt;
%pull('2009-01-04','2009-02-28',Jan09_Feb09);&lt;BR /&gt;
%pull('2009-03-01','2009-05-30',Mar09_May09);&lt;BR /&gt;
%pull('2009-05-31','2009-08-29',Jun09_Aug09);&lt;BR /&gt;
%pull('2009-08-30','2009-10-03',Sep09);&lt;BR /&gt;
%pull('2009-10-04','2009-10-31',Oct09);&lt;BR /&gt;
%pull('2009-11-01','2009-11-28',Nov09);&lt;BR /&gt;
%pull('2009-11-29','2010-01-02',Dec09);&lt;BR /&gt;
%pull('2010-01-03','2010-02-27',Jan10_Feb10);&lt;BR /&gt;
&lt;BR /&gt;
%macro append(range);&lt;BR /&gt;
proc append base=in.data data=in.data_&amp;amp;range.;&lt;BR /&gt;
run;&lt;BR /&gt;
%mend append;&lt;BR /&gt;
&lt;BR /&gt;
%append(Mar08_May08);&lt;BR /&gt;
%append(Jun08_Aug08);&lt;BR /&gt;
%append(Sep08);&lt;BR /&gt;
%append(Oct08);&lt;BR /&gt;
%append(Nov08);&lt;BR /&gt;
%append(Dec08);&lt;BR /&gt;
%append(Jan09_Feb09);&lt;BR /&gt;
%append(Mar09_May09);&lt;BR /&gt;
%append(Jun09_Aug09);&lt;BR /&gt;
%append(Sep09);&lt;BR /&gt;
%append(Oct09);&lt;BR /&gt;
%append(Nov09);&lt;BR /&gt;
%append(Dec09);&lt;BR /&gt;
%append(Jan10_Feb10);&lt;BR /&gt;
&lt;BR /&gt;
Problem:- it is occupying space twice. as the Final dataset is the combined one. I used to delete the intermediate datasets after verifying the sum of counts of all the intermediate datasets to the final one. I just wanted to do that step programmatically in such a way that after each append its verying the counts and if success deleting the verified datasets and then going to the next step of combining. Also, in case the counts doesnot match it should throw an errror.&lt;BR /&gt;
&lt;BR /&gt;
Please let me know the best possible solution for this.

Message was edited by: sohit</description>
      <pubDate>Thu, 27 May 2010 12:04:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Combine-large-datasets/m-p/28128#M6533</guid>
      <dc:creator>deleted_user</dc:creator>
      <dc:date>2010-05-27T12:04:16Z</dc:date>
    </item>
    <item>
      <title>Re: Combine large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Combine-large-datasets/m-p/28129#M6534</link>
      <description>Perhaps set it up something like below. &lt;BR /&gt;
Hopefully that helps.&lt;BR /&gt;
Cheers,&lt;BR /&gt;
Reeza&lt;BR /&gt;
&lt;BR /&gt;
%macro pull (bdate,edate,mthrange);&lt;BR /&gt;
&lt;BR /&gt;
proc sql;&lt;BR /&gt;
connect to oracle (user=zzzzzz pw=yyyyyy server=xxx);&lt;BR /&gt;
create table in.data_&amp;amp;mthrange. as&lt;BR /&gt;
select * from connection to teradata;&lt;BR /&gt;
(&lt;BR /&gt;
select * from trans_table&lt;BR /&gt;
where trans_dt between &amp;amp;bdate and &amp;amp;edate&lt;BR /&gt;
);&lt;BR /&gt;
disconnect from teradata;&lt;BR /&gt;
quit;&lt;BR /&gt;
&lt;BR /&gt;
*Count records in in.date (see below)&lt;BR /&gt;
*count records in in.data_&amp;amp;range. (see below)&lt;BR /&gt;
&lt;BR /&gt;
proc append base=in.data data=in.data_&amp;amp;range.;&lt;BR /&gt;
run;&lt;BR /&gt;
&lt;BR /&gt;
*count records in.date (see below)&lt;BR /&gt;
&lt;BR /&gt;
* verify the correct number of obs where added;&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
*drop table;&lt;BR /&gt;
proc sql;&lt;BR /&gt;
drop table in.data_&amp;amp;range;&lt;BR /&gt;
quit;&lt;BR /&gt;
&lt;BR /&gt;
%mend pull;&lt;BR /&gt;
&lt;BR /&gt;
&lt;BR /&gt;
Can use this code to get the count of the number of observations.&lt;BR /&gt;
&lt;BR /&gt;
 proc sql noprint ; &lt;BR /&gt;
     select nobs - delobs into :numobs &lt;BR /&gt;
       from dictionary.tables  *may need to change to sashelp.vtable&lt;BR /&gt;
       where libname="IN" and memname=DATASETNAME ; &lt;BR /&gt;
   quit ;</description>
      <pubDate>Thu, 27 May 2010 16:25:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Combine-large-datasets/m-p/28129#M6534</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2010-05-27T16:25:50Z</dc:date>
    </item>
    <item>
      <title>Re: Combine large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Combine-large-datasets/m-p/28130#M6535</link>
      <description>sohit&lt;BR /&gt;
had you considered creating the join within Teradata? &lt;BR /&gt;
 &lt;BR /&gt;
For minor changes to your process I would recommend&lt;BR /&gt;
1 in macro %pull select into a table named without macro variables, then you have only one to remove&lt;BR /&gt;
create table in.data_for_one_range  as&lt;BR /&gt;
 &lt;BR /&gt;
2 create a vew rather than table then it takes no extra space &lt;BR /&gt;
create  view    data_for_one_range  as&lt;BR /&gt;
 &lt;BR /&gt;
3 include the proc append within %pull, to a table named as a macro parameter&lt;BR /&gt;
%macro pull (bdate,edate,mthrange, collect= in.myExtract );&lt;BR /&gt;
proc append base=&amp;amp;collect data= data_for_one_range ;&lt;BR /&gt;
run;&lt;BR /&gt;
 &lt;BR /&gt;
4 add flag into the view indicating the date range selected&lt;BR /&gt;
select *, "&amp;amp;bdate and &amp;amp;edate" AS sel_range length=20 from connection to teradata&lt;BR /&gt;
 &lt;BR /&gt;
Collect statistics about the extracts after all are appended, using proc freq or summary like:[pre]proc summary data= in.myExtract nway ;&lt;BR /&gt;
   class  sel_range ;&lt;BR /&gt;
   output out= select_stats ;&lt;BR /&gt;
run ;&lt;BR /&gt;
proc print  n ;&lt;BR /&gt;
    id sel_range ;&lt;BR /&gt;
   sum _freq_ ;&lt;BR /&gt;
run; [/pre]&lt;BR /&gt;
 &lt;BR /&gt;
However,&lt;BR /&gt;
I do not understand what you can expect where you say "... &lt;I&gt;if success deleting the verified datasets and then going to the next step of combining. Also, in case the counts does not match it should throw an errror.&lt;/I&gt;"&lt;BR /&gt;
To what can you match and verify?  SAS has been validated, but you don't trust SAS? How can you verify an extract from Teradata?</description>
      <pubDate>Mon, 31 May 2010 10:01:15 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Combine-large-datasets/m-p/28130#M6535</guid>
      <dc:creator>Peter_C</dc:creator>
      <dc:date>2010-05-31T10:01:15Z</dc:date>
    </item>
    <item>
      <title>Re: Combine large datasets</title>
      <link>https://communities.sas.com/t5/SAS-Procedures/Combine-large-datasets/m-p/28131#M6536</link>
      <description>I find it difficult to think that you are going to get better performance in SAS than running your join in a Terradata pass thru.  If Terradata is running much faster with a monthly query, try using your macro to generate the Teradata with a series of UNION ALL joins, then bring the resultant table back to SAS.&lt;BR /&gt;
I personnally find the optimizer in Terradata less effective than that of Oracle (IMHO) so performance is very dependant on your query matching the stucture of the indecies.</description>
      <pubDate>Tue, 01 Jun 2010 12:53:22 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Procedures/Combine-large-datasets/m-p/28131#M6536</guid>
      <dc:creator>Flip</dc:creator>
      <dc:date>2010-06-01T12:53:22Z</dc:date>
    </item>
  </channel>
</rss>

