DATA Step, Macro, Functions and more

Join/Merge 100,000 data files..Most efficient technqiue?

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 7
Accepted Solution

Join/Merge 100,000 data files..Most efficient technqiue?

Hi,

 

I created 100,000 scheduled balance payment tables (per ID) with the proc loans command.
Every table has 3 Colums with 1. id 2. time and 3. scheduled payment values. Every table has 360 rows.

Now I want these 100,000 tables merged by id and time to my base panel dataset.

I tried the data step merge command with a loop written around my macro but it would take me ages.
Then I tried to create a full table with all 360*100,000 rows with a simple datastep set command.
Also i tried to use the proc datastep append command, which was even slower.

is there an efficient way to do that ?


Accepted Solutions
Solution
‎03-23-2017 05:09 AM
Super User
Posts: 6,928

Re: Join/Merge 100,000 data files..Most efficient technqiue?

Since a set statement will only accept datasets until the 32767 character limit for a single SAS statement is reached, once I ran into a situation like yours (which I would avoid like hell, BTW), I'd start by cumulating every 1000 datasets into one (creating 100 data steps dynamically), and then cumulate those 100 datasets into the final one.

But I'd try to tackle the problem from the other side by not creating the 100000 separate datasets in the first place; must admit though that I have no clue how proc loan works, as we don't have ETS licensed.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers

View solution in original post


All Replies
Solution
‎03-23-2017 05:09 AM
Super User
Posts: 6,928

Re: Join/Merge 100,000 data files..Most efficient technqiue?

Since a set statement will only accept datasets until the 32767 character limit for a single SAS statement is reached, once I ran into a situation like yours (which I would avoid like hell, BTW), I'd start by cumulating every 1000 datasets into one (creating 100 data steps dynamically), and then cumulate those 100 datasets into the final one.

But I'd try to tackle the problem from the other side by not creating the 100000 separate datasets in the first place; must admit though that I have no clue how proc loan works, as we don't have ETS licensed.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Super User
Super User
Posts: 7,392

Re: Join/Merge 100,000 data files..Most efficient technqiue?

As @KurtBremser has said, creating 100,000 tables doesn't sound like an efficient method of working.  

One thing you could do is to set all the files, then transpose them up:

data f1;
  id=1; time=2; payment=2;
run;
data f2;
  id=2; time=10; payment=5;
run;
data inter;
  set f:;
run;
proc transpose data=inter out=want prefix=time;
  by id;
  var payment;
  id time;
  idlabel time;
run;

Just as an example - you can use the : modifier to include all files with a prefix then, save typing them out.  Do note, this isn't a good strcuture to program with - i.e. having all the data across.  Unless its specifically for an output report, stick with the inter dataset, program with that, and only transpose for your output procedure - will save you lots of mesy coding.

Occasional Contributor
Posts: 7

Re: Join/Merge 100,000 data files..Most efficient technqiue?

Hi guys,

Thanks for your advice. Just to let you know, i did the following. I did like KurtBremser advised. I splitted the data set in several smaller ones and then everything worked!

Thanks!

Konstantin
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 143 views
  • 2 likes
  • 3 in conversation