BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
W1ndwaker
Obsidian | Level 7

Hello,

 

i'm currently in a migration moving from AIX to UNIX (RHEL) and we need to migrate our sas dataset files from source to new server, since there is an incopatibility between filesystems, and we're using PROC MIGRATE mapping source libraries on new server RHEL, PROC MIGRATE runs about an average of 150GB per PROC running ( i've seen i can run more than 5 PROCs at the same time without problem) and this tooks arround 10-20% of the cpu consumption, my question, is there any posibility to increase the ratio or the resurces the PROC takes to increase the performance and get better migration rates? my goal would be 300GB/h in order to have all the data migrated in 1 day.

 

If any one has encounter this situation during a migration may can help!

 

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
W1ndwaker
Obsidian | Level 7
After some more test i get to the point that the proc migrate cannot go faster than a certain speed per thread, i found a param, cpucount to force the utilization of all available cpus in the systems, this combined with a moment of no work on the source disk gave me a 220GB/h, quite good, but in the end the target server was running about 20%of cpu usage, if i put 2 proc migrates i get the double rate, 430GB/h and 40% cpu usage.

In the end i supose the best way is spliting the big directories, i have few of them with more than 300 files and 2-3 TB each, if 1 proc takes arround 10-12 hours to finish if i split the folder on 3 subfolders i get this to 3-4 hours and the system runs correctly without errors, this is where we will go for the migration day.

View solution in original post

10 REPLIES 10
Kurt_Bremser
Super User

I take it that you have to move your data over the network. 300 GB/h translates roughly to 85 MB/s, which means you need a 1 Gbit network bandwidth throughout the whole distance, including all switches/firewalls.

 

W1ndwaker
Obsidian | Level 7
I supose bandwith is not a problem, since i got 6 simultaneous proces running giving a maximum 720GB/h at peak performance, the problem comes when a single directory (or library as you wanna call it) has a big ammount of data like 2TB of data, then I can only run 1 proc migrate on that folder and this is where the bottleneck is for me, if i can get a single proc migrate to a 300GB would be great for folders about 2.5TB

Thanks for the reply btw!!
Kurt_Bremser
Super User

I suspect that the MIGRATE procedure is single-threaded, which would explain the limitation.

Have you tried this:

  • copy the .sas7bdat file from one server to the other
  • run a simple data step on the target server to rewrite the dataset, making one-time use of CEDA
W1ndwaker
Obsidian | Level 7
Yep i did, we ponder between the methods to migrate the whole data and it works probably faster, there ara 2 more point, first, you need to move the whole data from A to B and this will take a lot of time also, and then proceed with a DATA step on every single dataset, considering we can have thousands.

Resuming the best option seeing all the posibilities it's to migrate from the source directory mapped on the new server but will be great to have a multithread proces (that i think it uses while creating the index of the migrated table).

Reading the doc didn't saw any option rather than buffersize and this doesn't seem have any effect on the process
SASKiwi
PROC Star

SAS/CONNECT and PROC UPLOAD is another way of migrating between SAS installations. You do need the product installed and licensed on both SAS installations though. Also there are the Base SAS CPORT / CIMPORT procedures which can do whole SAS libraries in one process.

W1ndwaker
Obsidian | Level 7
Actually I've already try all this methods, getting to the point that migrate is the best way, cport/cimport is more trikier since you have to do 2 operation for the same result as a migrate, the thing is the stopper on proc migrate to work at a capped speed, I also tryed to modify sasv9.cfg performance options to give more resources for the session in ordre to get more GB/h during the proc migrate, but not working out.
Kurt_Bremser
Super User

I would set up a method to copy (sftp) several files in parallel, and then (also parallel) run data steps to do the conversion.

Before that, run tests to see if CEDA (using the files as is) is feasible in production use.

SASKiwi
PROC Star

Definitely worth exploring parallel processing in that case.

Tom
Super User Tom
Super User

@W1ndwaker wrote:
I supose bandwith is not a problem, since i got 6 simultaneous proces running giving a maximum 720GB/h at peak performance, the problem comes when a single directory (or library as you wanna call it) has a big ammount of data like 2TB of data, then I can only run 1 proc migrate on that folder and this is where the bottleneck is for me, if i can get a single proc migrate to a 300GB would be great for folders about 2.5TB

Thanks for the reply btw!!

Sound like the bottle neck is the speed of reading from that location. Or perhaps more likely the problem is you are writing them all to same target location, as writing takes more time that reading since you get no boost from cacheing.   Can you split up the files into batches and write to different physical output disks?

 

W1ndwaker
Obsidian | Level 7
After some more test i get to the point that the proc migrate cannot go faster than a certain speed per thread, i found a param, cpucount to force the utilization of all available cpus in the systems, this combined with a moment of no work on the source disk gave me a 220GB/h, quite good, but in the end the target server was running about 20%of cpu usage, if i put 2 proc migrates i get the double rate, 430GB/h and 40% cpu usage.

In the end i supose the best way is spliting the big directories, i have few of them with more than 300 files and 2-3 TB each, if 1 proc takes arround 10-12 hours to finish if i split the folder on 3 subfolders i get this to 3-4 hours and the system runs correctly without errors, this is where we will go for the migration day.

suga badge.PNGThe SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment. 

Join SUGA 

Get Started with SAS Information Catalog in SAS Viya

SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 10 replies
  • 997 views
  • 7 likes
  • 4 in conversation