Way to resume PROC SORT from LCK file? - Page 2

data_null__ · Posted 05-06-2014 08:35 AM

Maybe you saw this SAScomminity.org tip HASH sort (SAScommunity.org tip-of-the-day 08OCT2013)

I applied CALL VNEXT to generalize it somewhat.

Peter_C · Posted 05-06-2014 08:23 AM

It is the roll-up of GBs of data that prompted a suggestion of proc summary operating on blocks or subsets. As proc summary is more memory-based and reduces i/o on output it provides a trade-off but demands memory so that is an issue to manage. Of course a hash approach would be top if the result set could sit in available memory.

To start discovery of potential for all these alternatives we would do a nLevels analysis of the sort keys or roll-up variables, with proc freq.

peter

Peter_C · Posted 05-10-2014 12:26 PM

with a test run to create a 6GB sas dataset, then first sort it in blocks then do a rollup (proc summary) in blocks of increasing size, here is some code to demonstrate the blocks suggestion I was making

data tlarge ;

set sampsio.empinfo; * should be available, just try it;

do _n_=1 to 1e5;

output;

end;

run;

%let byv = divcode division ;

libname user ( './' work ) ;

data _null_;

retain start 1 ;

do exp=4 to 8 ;

block=10 ** exp ;

call execute( 'proc sort data= tlarge( firstobs=' !! put( start, 9.-L ) ) ;

call execute( ' obs = ' !! put( block, 9.-L ) ) ;

call execute( ') out=tlarg_' !! put( start, 9.-L ) ) ;

call execute( "; by &byv ; run ;" );

start = block+1 ;

end ;

run ;

data _null_;

retain start 1 ;

do exp=4 to 8 ;

block=10 ** exp ;

call execute( 'proc summary data= tlarge( firstobs=' !! put( start, 9.-L ) ) ;

call execute( ' obs = ' !! put( block, 9.-L ) ) ;

call execute( ') missing noprint nway ; ' );

call execute( "class &byv ; var _numeric_ ;" );

call execute( 'output sum= out= tlasum' !! put( start, 9.-L ) ) ;

call execute( "; run ;" );

start = block+1 ;

end ;

run ;

jakarman · Posted 05-12-2014 01:26 PM

It has always been the best approach to make your data smaller while not losing information. It will always be so when you are going beyond the comfort zone of your machine.

It will never have unlimited speed and unlimited storage resources. As at the moment you are hitting that you need to think harder. More clever approach to the data or you analyses.

@Gergely, your proposal is going back to that question. When the assumptions are not met still an issue. It also depend how often it should be executed.

A Hadoop style is asking a lot of storage, even more as just sorting. It is designed to be most read-only access while have a lot of spread data duplications.

---->-- ja karman --<-----

Re: Way to resume PROC SORT from LCK file?

Re: Way to resume PROC SORT from LCK file?

Re: Way to resume PROC SORT from LCK file?

Re: Way to resume PROC SORT from LCK file?

Re: Way to resume PROC SORT from LCK file?

Re: Way to resume PROC SORT from LCK file?

Re: Way to resume PROC SORT from LCK file?

Re: Way to resume PROC SORT from LCK file?

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away