Help using Base SAS procedures

Combine large datasets

Reply
N/A
Posts: 0

Combine large datasets

Hi,

I am very new to SAS.

I have just tried to write a macro for pulling 2 years transaction data hitting the teradata DB. I had pulled monthly basis just for better performance on DB side.

I have combined all the datasets into a single at the end using proc append.

Following is the code:


%macro pull (bdate,edate,mthrange);

proc sql;
connect to oracle (user=zzzzzz pw=yyyyyy server=xxx);
create table in.data_&mthrange. as
select * from connection to teradata;
(
select * from trans_table
where trans_dt between &bdate and &edate
);
disconnect from teradata;
quit;


%mend pull;
%pull('2008-03-02','2008-05-31',Mar08_May08);
%pull('2008-06-01','2008-08-30',Jun08_Aug08);
%pull('2008-08-31','2008-10-04',Sep08);
%pull('2008-10-05','2008-11-04',Oct08);
%pull('2008-11-02','2008-11-29',Nov08);
%pull('2008-11-30','2009-01-03',Dec08);
%pull('2009-01-04','2009-02-28',Jan09_Feb09);
%pull('2009-03-01','2009-05-30',Mar09_May09);
%pull('2009-05-31','2009-08-29',Jun09_Aug09);
%pull('2009-08-30','2009-10-03',Sep09);
%pull('2009-10-04','2009-10-31',Oct09);
%pull('2009-11-01','2009-11-28',Nov09);
%pull('2009-11-29','2010-01-02',Dec09);
%pull('2010-01-03','2010-02-27',Jan10_Feb10);

%macro append(range);
proc append base=in.data data=in.data_&range.;
run;
%mend append;

%append(Mar08_May08);
%append(Jun08_Aug08);
%append(Sep08);
%append(Oct08);
%append(Nov08);
%append(Dec08);
%append(Jan09_Feb09);
%append(Mar09_May09);
%append(Jun09_Aug09);
%append(Sep09);
%append(Oct09);
%append(Nov09);
%append(Dec09);
%append(Jan10_Feb10);

Problem:- it is occupying space twice. as the Final dataset is the combined one. I used to delete the intermediate datasets after verifying the sum of counts of all the intermediate datasets to the final one. I just wanted to do that step programmatically in such a way that after each append its verying the counts and if success deleting the verified datasets and then going to the next step of combining. Also, in case the counts doesnot match it should throw an errror.

Please let me know the best possible solution for this. Message was edited by: sohit
Super User
Posts: 17,831

Re: Combine large datasets

Perhaps set it up something like below.
Hopefully that helps.
Cheers,
Reeza

%macro pull (bdate,edate,mthrange);

proc sql;
connect to oracle (user=zzzzzz pw=yyyyyy server=xxx);
create table in.data_&mthrange. as
select * from connection to teradata;
(
select * from trans_table
where trans_dt between &bdate and &edate
);
disconnect from teradata;
quit;

*Count records in in.date (see below)
*count records in in.data_&range. (see below)

proc append base=in.data data=in.data_&range.;
run;

*count records in.date (see below)

* verify the correct number of obs where added;



*drop table;
proc sql;
drop table in.data_⦥
quit;

%mend pull;


Can use this code to get the count of the number of observations.

proc sql noprint ;
select nobs - delobs into :numobs
from dictionary.tables *may need to change to sashelp.vtable
where libname="IN" and memname=DATASETNAME ;
quit ;
Valued Guide
Posts: 2,175

Re: Combine large datasets

sohit
had you considered creating the join within Teradata?

For minor changes to your process I would recommend
1 in macro %pull select into a table named without macro variables, then you have only one to remove
create table in.data_for_one_range as

2 create a vew rather than table then it takes no extra space
create view data_for_one_range as

3 include the proc append within %pull, to a table named as a macro parameter
%macro pull (bdate,edate,mthrange, collect= in.myExtract );
proc append base=&collect data= data_for_one_range ;
run;

4 add flag into the view indicating the date range selected
select *, "&bdate and &edate" AS sel_range length=20 from connection to teradata

Collect statistics about the extracts after all are appended, using proc freq or summary like:[pre]proc summary data= in.myExtract nway ;
class sel_range ;
output out= select_stats ;
run ;
proc print n ;
id sel_range ;
sum _freq_ ;
run; [/pre]

However,
I do not understand what you can expect where you say "... if success deleting the verified datasets and then going to the next step of combining. Also, in case the counts does not match it should throw an errror."
To what can you match and verify? SAS has been validated, but you don't trust SAS? How can you verify an extract from Teradata?
Super Contributor
Posts: 359

Re: Combine large datasets

I find it difficult to think that you are going to get better performance in SAS than running your join in a Terradata pass thru. If Terradata is running much faster with a monthly query, try using your macro to generate the Teradata with a series of UNION ALL joins, then bring the resultant table back to SAS.
I personnally find the optimizer in Terradata less effective than that of Oracle (IMHO) so performance is very dependant on your query matching the stucture of the indecies.
Ask a Question
Discussion stats
  • 3 replies
  • 224 views
  • 0 likes
  • 4 in conversation