Re: Moving large datasets from one system to another

naz181 · Posted 11-16-2023 05:54 AM

Hi,

I am trying to move some large datasets from my Windows environment to a new Linux environment. The encoding etc has been updated to ensure that the datasets will work on Linux.

I read that Cport & Cimport are the recommended way to create transport files and bring them in for migration which is fine.

However, the datasets are quite large as are the resulting transport files are also very big so it will take a long time to transfer such big files.

I read somewhere that you can use GZip or some other zip type utility to zip up data and make its size smaller (some suggested the ODS package method and others have suggested the GZip filename - either are fine for me).

The question I have is that if I do use GZIP and use the supported approach - which is to Zip the data rather than the transport file from Cport- Is it then as simple as going moving it from the Windows to the Linux system, unzipping it and then using it? If so then do I do away with Cport and CImport altogether?

Thanks,

N.

yabwon · Posted 11-16-2023 06:04 AM

I can recommend you the BasePlus package and two dedicated macros: %zipLibrary() and %unzipLibrary()

They both use internal SAS zip functionality, so no extra software is needed,and they are OS independent.

To use packages you need to download them and use SAS Packages Framework. Check out framework's repository to see how to work with it.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

naz181 · Posted 11-16-2023 06:18 AM

Thanks Bart, that's interesting I've never come across Sas packages before.

Do you know if once the datasets are unzipped onto the target system they should just work?

Thanks

yabwon · Posted 11-16-2023 06:24 AM

That was the assumption when I was designing and writing those macros (read the documentation you find examples there).

If you would like to get a "general overview" of the SAS Packages idea, check out this: https://github.com/yabwon/SAS_PACKAGES#recordings-and-presentations

This is a list of my presentations about the SAS Packages, the "A BasePlus Package for SAS" - SAS Explore 2022 is about the basePlus.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

SASKiwi · Posted 11-16-2023 02:19 PM

Please note Windows and Linux SAS datasets have different structures, so you can't just zip, copy and unzip them. Either you use CPORT / CIMPORT as you already know or you can use SAS/CONNECT PROC UPLOAD, which does dataset conversion on the fly.

yabwon · Posted 11-16-2023 03:08 PM

Between Linux and Windows it shouldn't make problems.

You can always translate "linux dataset" under Windows with OUTREP= Data Set Option

And if you remember about using UTF-8 encoding you shouldn't have transcoding issues.

[EDIT:] Here is the link for OUTREP= documentation with the list of all systems: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/ledsoptsref/n0p1yuyzltd52jn1dubaao0dlk2m.htm

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

SASKiwi · Posted 11-16-2023 03:56 PM

@yabwon - Thanks for pointing out the OUTREP option. Will it also work OK on the target linux server to read in a Windows dataset and write out a linux-format one?

yabwon · Posted 11-17-2023 02:31 AM

You can create "Windows dedicated" data set working on Linux, and "Linux dedicated" data set working on Windows.

When you are reading dataset you don't need outrep= option SAS figures out what to do based on data set metadata in header.

Of course in all 4 possible cases (2 for creation , 2 for reading) SAS lets you know that "translation" happens with this little note:

NOTE: Data file LLLL.XXXXXX.DATA is in a format that is native to another host, or the file encoding does not match the
session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
performance.

[EDIT:] Here is the link for OUTREP= documentation with the list of all systems: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/ledsoptsref/n0p1yuyzltd52jn1dubaao0dlk2m.htm

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Patrick · Posted 11-16-2023 07:06 AM

Assuming you've got a remote Windows server, a remote Linux server and a local machine: How do you plan to move the data? Directly server1->server2 or server1->local machine->server2?

If possible then server->server would be preferable and very likely with better network speed.

Instead of zipping your files you could also look into tools/protocols that allow for data compression on transit. What will work for you depends on what's available on your Windows server. I believe compression on transit exists for sftp, scp and rsync.

Another option would be a file system or a data lake that's accessible from both environments.

And last but not least: If you've got SAS connect on both machines and the ports are open then you could also transfer the data directly via SAS (run in batch on the Linux side with nohup). This way you wouldn't need to bother about creation of transport files.

yabwon · Posted 11-16-2023 07:18 AM

BTW.

Direct googling of @Patrick 's sentence: "SAS connect on both machines and the ports are open then you could also transfer the data directly via SAS" provides this link: https://support.sas.com/resources/papers/proceedings/proceedings/sugi24/Advtutor/p43-24.pdf to paper about SAS/Connect and data transfer.

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation

Sajid01 · Posted 11-17-2023 03:28 PM

Hello @naz181

1.Compressing/ uncompressing large datasets/ transport files does take time and resources.

2.In my experience the optimal approach is to use ssh to move the transport files from Windows to Linux. You can also use FTP clients.
It may take time but the processes is time tested.
Working with SAS in multiple OS environment's I know of places where batch processes using ssh move large transport files from Linux to windows on a routine basis.

Tom · Posted 11-17-2023 04:02 PM

Window and Unix write compatible datasets. SAS can read a Window created dataset on a Unix machine and the reverse. But there is a performance hit. So you probably want to re-create the dataset on Unix once it is there.

Compressing the file will help with the transfer (although I think some transfer protocols can do that on the fly). GZIP can be used for a single file. ZIP can let you put multiple files into one "archive". You will need to reverse the process on the target machine once the file is copied there. SAS datasets (and also SAS CPORT files) will normally compress a LOT. I used to see the size reduced by 80-95 % back in the day.