topic Re: Join/Merge 100,000 data files..Most efficient technqiue? in SAS Programming

Join/Merge 100,000 data files..Most efficient technqiue?

kosmirnov — Fri, 17 Mar 2017 09:59:38 GMT

Hi,

I created 100,000 scheduled balance payment tables (per ID) with the proc loans command.
Every table has 3 Colums with 1. id 2. time and 3. scheduled payment values. Every table has 360 rows.

Now I want these 100,000 tables merged by id and time to my base panel dataset.

I tried the data step merge command with a loop written around my macro but it would take me ages.
Then I tried to create a full table with all 360*100,000 rows with a simple datastep set command.
Also i tried to use the proc datastep append command, which was even slower.

is there an efficient way to do that ?

Re: Join/Merge 100,000 data files..Most efficient technqiue?

Kurt_Bremser — Fri, 17 Mar 2017 10:16:16 GMT

Since a set statement will only accept datasets until the 32767 character limit for a single SAS statement is reached, once I ran into a situation like yours (which I would avoid like hell, BTW), I'd start by cumulating every 1000 datasets into one (creating 100 data steps dynamically), and then cumulate those 100 datasets into the final one.

But I'd try to tackle the problem from the other side by not creating the 100000 separate datasets in the first place; must admit though that I have no clue how proc loan works, as we don't have ETS licensed.

Re: Join/Merge 100,000 data files..Most efficient technqiue?

RW9 — Fri, 17 Mar 2017 10:31:40 GMT

As @Kurt_Bremser has said, creating 100,000 tables doesn't sound like an efficient method of working.

One thing you could do is to set all the files, then transpose them up:

data f1;
  id=1; time=2; payment=2;
run;
data f2;
  id=2; time=10; payment=5;
run;
data inter;
  set f:;
run;
proc transpose data=inter out=want prefix=time;
  by id;
  var payment;
  id time;
  idlabel time;
run;

Just as an example - you can use the : modifier to include all files with a prefix then, save typing them out. Do note, this isn't a good strcuture to program with - i.e. having all the data across. Unless its specifically for an output report, stick with the inter dataset, program with that, and only transpose for your output procedure - will save you lots of mesy coding.

Re: Join/Merge 100,000 data files..Most efficient technqiue?

kosmirnov — Thu, 23 Mar 2017 09:07:42 GMT

Hi guys,

Thanks for your advice. Just to let you know, i did the following. I did like KurtBremser advised. I splitted the data set in several smaller ones and then everything worked!

Thanks!

Konstantin