Hello all
SAS 9.3
EG 5.1
Windows 7
I have two tables History,Transaction. History table is huge(51million records). We need to update the history table everyday with the transaction table i.e Update already exisiting records and Insert new records. But the problem with the merge is sorting both the tables which is taking almost 3.5 hours and the merge around 3.5 hours.
Usually we do Delete and Insert in SQL. Looking to find a way to make this process a bit quicker on SAS.
Looking forward for your valuable suggestions.
Proc sort Data =History ; by common_KEY; run; Proc sort Data =transaction; by common_KEY; run; DATA History; MERGE History transaction; BY common_KEY; RUN;
If your existing HISTORY is the result of the same process there is no need to sort it again. Merge preserves the order in the output.
Usually, when dealing with large tables, one needs to dive into performance tuning:
- have a separate UTILLOC defined in the sasv9.cfg file. UTILLOC is the place where the utility file during a sort is stored.
- have both WORK and UTILLOC physically separated on the fastest disks available (SSD preferred, nowadays).
- try to avoid reading and writing simultaneously on the same physical resource; have source and target tables during the merge on physically separate disks.
- use the compress=yes option to reduce physical dataset size, if that is not already in place.
Until we switch to a really fast SAN environment, we have this setup:
Production data on SAN
WORK location internally in the server, separate disk(s) for UTILLOC
Additional disks in the server, used as additional temporary locations
The process would look like this:
- sort dataset1 from production to temp1
- sort dataset2 from production to temp2
- merge temp1.dataset1 and temp2.dataset2 back to production
before doing the merge, you can check for &SYSCC=0 to prevent that step in case something went wrong during the sorts or other preliminary steps
@KiranMaddi wrote:
RE: If your existing HISTORY is the result of the same process there is no need to sort it again. Merge preserves the order in the output.
That's an interesting point. I never knew that Merge would preserve the order. If this is the case I would never need to sort the histoty table?
Absolutely. The only difference is that the sort bit in the table header is not set, but that does not affect a further merge.
As long as you never resort the HISTORY table by another column, you can merge without sorting.
Part of the efficiency of the data step merge is its simplicity.
Even if you do
proc sort dataset1;
by id;
run;
proc sort dataset2;
by id;
run;
data dataset3;
set
dataset1
dataset2
;
by id; *!;
run;
the final dataset3 will be sorted by id. This is useful when interleaving datasets with identical columns.
Have you tried update/modify instead of merge?
You can make an INDEX for this BY variable to avoid PROC SORT again.
OK. Assuming there are unique ID number for each row.
data history;
do id=1 to 100;
x='x';output;
end;
run;
data transaction;
do id=90 to 102;
x='y';output;
end;
run;
data want;
if _n_=1 then do;
if 0 then set transaction;
declare hash h(dataset:'transaction');
declare hiter hi('h');
h.definekey('id');
h.definedata('id','x');
h.definedone();
end;
set history end=last;
if h.find()=0 then h.remove();
output;
if last then do;
do while(hi.next()=0);
output;
end;
end;
run;
Thanks a lot Ksharp.
The unique index is on 2 variables, so I would expect some duplicates in the ID.
I have never used hashing on SAS. Please,would you mind explaining briefly what the code is doing in the third step?
First of all , you need to know you are doing One-to-One match or Many-to-Many match ?(my code is for one-to-one match)
Hash Table is a long story, I can't tell you its detail, check documentation on your own.
If you have lots of variable ,make a macro variable contain all these variables name, Like:
select quote(name) into : list separated by ','
from dictionary.columns
where .............
Yes. My code is only for one-to-one match.
If you want do one-to-many match. Can you post an example and its output to explain it ?
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.