The datasets are not identical in metadata and have a few overlapping named variables. I worry about stacking the datasets because of this. I wouldn’t want to accidentally truncate values in instances where the lengths and things disagreed but the variable names were the same. Because of this I like your approach to rename the variables to something new then rename them back post merge, this could work but I would also have to drop and variables that were in one dataset but not in the appended dataset. This is shown in the second example of the code that I had originally attached. This would also be easier if I used your renaming method and just made each variable have a prefix of the variable name. That would simplify my keep= statements. I might attempt using multiple formats to see how that works out, I’m not a fan of needing to concatenate the multiple primary keys to make one for the index and then subsequently doing the same on the base dataset. That seems like a lot of reads to me. At this point I think I am going to have to just sort/sort merge my lookup tables and split up the larger files prior to merging them. That or revisit my settings on my UNIX server. Right now my config file has Memsize 8g/Sortsize 1g/BUFNO =20/BUFSIZE=512k I will need to see if I can up some of these to get the tables to load into hash faster. It is a virtual machine so potentially I could have the server admin allocate more memory for this task, or move the job to a higher powered server. Thanks!
... View more