About ErikT_NL

ErikT_NL · ‎12-23-2024

Another take: if your MyPop_tbl is not too big, let us say less than 100,000 you could also convert it into a format by creating an CNTLIN data set. Use a label that cannot be confused with an ID, e.g. '##@@!!'. Then you can read your monthly files, and test whether a put of the ID with that format matches the label. If so, write it out. My experience is that working with a format is often faster than merging. A simplified model below (using the long file from my other response): data cntlin; fmtname='$MyPop'; label = '##@@!!'; DO start = 'D','H','Q','V'; output; end; run; proc format cntlin=cntlin; run; data MySelection; set long; if put(ID,$MyPop.)='##@@!!' then output; RUN;

ErikT_NL · ‎12-22-2024

I am missing something in your code: you should either sort the LongFile_t or you should build an index while creating it. I replicated your program in a somewhat simplified version: The monthly files are created by following step: data one; * repeat with data sets two - twelve; length ID $1; do n=1 to 2000000; x = 100*uniform(0); a = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'; ID = substr(a,int(uniform(0)*26+1),1); output; end; run; Then I glue them together either by sorting (note that I use a data step VIEW to glue the files virtually together. That saves writing out the long file.): data long/view=long; set one two three four five six seven eight nine ten eleven twelve; run; proc sort data=long out=long2; by ID; run; or the way you did it, but creating an index on ID on the fly: data long(index=(ID)); set one two three four five six seven eight nine ten eleven twelve; run; Via both paths you can do the final merge. I selected 4 ID's to get from the big file as my target population (selecting some 3.7 million obs). These were my running times for the various steps: Create the long file with the index: real time 5.80 seconds, cpu time 3.57 seconds Merge based on the indexed file: real time 10.04 seconds, cpu time 8.87 seconds Create the long file via data step VIEW: real time 0.01 seconds, cpu time 0.00 seconds Sort the VIEW: real time 5.95 seconds, cpu time 2.15 seconds Merge based on the sorted long file: real time 1.33 seconds, cpu time 0.34 seconds The alternative approach would be to sort the separate PCS files and filter your group at that level and glue the resulting files together. Sorting one file of 2,000,000 observations took 0.33 seconds RT and 0.11 seconds CPU, creating the selection took 0.13 seconds RT and 0.10 seconds CPU. Multiplied by 12 and add a step to glue everything together you are possibly still better of. I hope this helps.

Online Status	Offline
Date Last Visited	‎12-24-2024 07:39 PM

Re: set and merge

Re: set and merge

Re: set and merge

Re: set and merge

Re: set and merge

Re: set and merge

Re: set and merge