About Hugochum

Hugochum · ‎02-12-2021

Roughly, the data has the following form: data A; /* ... */ client_id = 1234567890; value = 42; output; /* ... */ run; data B; /* ... */ date = '01JAN1997'd; factor = 0.2; output; /* ... */ run; The dataset to store should list a score for each client_id and date , like so: data AB; set A; do point_B = 1 to nobs_B; set B point = point_B nobs = nobs_B; score = factor * value; output; end; run; Probably there is an adhoc solution for this concrete problem (like splitting A into chunks). But it would be quite instructive also for future applications to know how to efficiently loop through multiple datasets.

Hugochum · ‎02-12-2021

What I need to do, boils down (more or less) to joining two huge datasets A and B. The first approach is of course: proc sql; create table AB as select * from A join B on 1; quit; But this is infeasible due to memory restrictions. The obvious alternative is to loop, for each observation of A, through each observation of B. My current naive approach (ab)uses the POINT option of the SET statement: data AB; set A; do point_B = 1 to nobs_B; set B point = point_B nobs = nobs_B; output; end; run; This works - but is incredibly slow. Certainly, this can be remedied by replacing the random access of POINT by sequential access. Unfortunately, I was not able to figure out how this can be achieved. So any suggestions are greatly appreciated. For instance, what would be nice to have is some option RESET (similar to KEYRESET) making the following code work after uncommenting reset = end_A : data AB; do while(not end_A); set A end = end_A; do while(not end_B); set B end = end_B /* reset = end_A */; output; end; end; run; (If there is no other solution, I might have to split A into multiple smaller datasets first...)

Online Status	Offline
Date Last Visited	‎02-12-2021 02:05 PM

Re: Reset observation pointer

Reset observation pointer

Re: Reset observation pointer

How to: List all the files in a folder

Re: Reset observation pointer

Reset observation pointer