Hello friends. This is my first venture into this forum, though I've been programming in SAS for 25+ years... A process that updates a cumulative dataset (referenced as sdcum in the snip) with a weekly update file (referenced as sdwkly) has become problematic as it takes way too long to run. The cumulative dataset, in one case, has grown to about 460M records, while the update data may contain as few as a couple hundred records. When the sdcum file is smaller, or the update datasets are much larger, the process runs much quicker (strangely enough). It is when the update files get small (as the sdcum files become older) that things slow down. The update dataset (sdwkly) has a combination of new and revised records. So the goal is to add if it is new, and keep the most current if it is an update. I have no way of knowing which it is heading into the process. The "Select Distinct" approach seems to be quite the hog. For the one dataset in question here (460M records), the update now takes about 1.5 Hours. Repeating that over 13 datasets, means this process has become unwieldy. Any recommendations on a more efficient approach (short of porting the entire process into a SQL environment)? Thanks! proc sql ; drop index VIN_VEH_INEX&dbase FROM sdcum.VIN_VEH_OPTNS_&dbase ; quit ; * create a temp dataset from cumulative and append weekly to it ; data sdcum.TEMP_VIN_VEH_OPTNS_&dbase ; set sdcum.VIN_VEH_OPTNS_&dbase ; run ; Proc append Base=sdcum.TEMP_VIN_VEH_OPTNS_&dbase Data=sdwky.VIN_VEH_OPTNS_&dbase ; Run ; proc sql ; create table sdcum.VIN_VEH_OPTNS_&dbase as select Distinct GMC_VEH_IDENT_NBR, GMC_OPTN_CD, max(GMC_DWH_UPD_TIMSTM_VV) as GMC_DWH_UPD_TIMSTM_VV format=YYMMDD10. , max(GMC_PULL_DT) as GMC_PULL_DT format=YYMMDD10. from sdcum.TEMP_VIN_VEH_OPTNS_&dbase group by GMC_VEH_IDENT_NBR,GMC_OPTN_CD order by GMC_VEH_IDENT_NBR,GMC_OPTN_CD,GMC_DWH_UPD_TIMSTM_VV desc,GMC_PULL_DT desc ; drop table sdcum.TEMP_VIN_VEH_OPTNS_&dbase ; quit ; proc sql ; create index VIN_VEH_INEX&dbase ON sdcum.VIN_VEH_OPTNS_&dbase (GMC_VEH_IDENT_NBR, GMC_OPTN_CD) ; quit ;
... View more