Hi,
I am trying to move some large datasets from my Windows environment to a new Linux environment. The encoding etc has been updated to ensure that the datasets will work on Linux.
I read that Cport & Cimport are the recommended way to create transport files and bring them in for migration which is fine.
However, the datasets are quite large as are the resulting transport files are also very big so it will take a long time to transfer such big files.
I read somewhere that you can use GZip or some other zip type utility to zip up data and make its size smaller (some suggested the ODS package method and others have suggested the GZip filename - either are fine for me).
The question I have is that if I do use GZIP and use the supported approach - which is to Zip the data rather than the transport file from Cport- Is it then as simple as going moving it from the Windows to the Linux system, unzipping it and then using it? If so then do I do away with Cport and CImport altogether?
Thanks,
N.
I can recommend you the BasePlus package and two dedicated macros: %zipLibrary() and %unzipLibrary()
They both use internal SAS zip functionality, so no extra software is needed,and they are OS independent.
To use packages you need to download them and use SAS Packages Framework. Check out framework's repository to see how to work with it.
Bart
That was the assumption when I was designing and writing those macros (read the documentation you find examples there).
If you would like to get a "general overview" of the SAS Packages idea, check out this: https://github.com/yabwon/SAS_PACKAGES#recordings-and-presentations
This is a list of my presentations about the SAS Packages, the "A BasePlus Package for SAS" - SAS Explore 2022 is about the basePlus.
Bart
Please note Windows and Linux SAS datasets have different structures, so you can't just zip, copy and unzip them. Either you use CPORT / CIMPORT as you already know or you can use SAS/CONNECT PROC UPLOAD, which does dataset conversion on the fly.
Between Linux and Windows it shouldn't make problems.
You can always translate "linux dataset" under Windows with OUTREP= Data Set Option
And if you remember about using UTF-8 encoding you shouldn't have transcoding issues.
[EDIT:] Here is the link for OUTREP= documentation with the list of all systems: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/ledsoptsref/n0p1yuyzltd52jn1dubaao0dlk2m.htm
Bart
@yabwon - Thanks for pointing out the OUTREP option. Will it also work OK on the target linux server to read in a Windows dataset and write out a linux-format one?
You can create "Windows dedicated" data set working on Linux, and "Linux dedicated" data set working on Windows.
When you are reading dataset you don't need outrep= option SAS figures out what to do based on data set metadata in header.
Of course in all 4 possible cases (2 for creation , 2 for reading) SAS lets you know that "translation" happens with this little note:
NOTE: Data file LLLL.XXXXXX.DATA is in a format that is native to another host, or the file encoding does not match the
session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce
performance.
[EDIT:] Here is the link for OUTREP= documentation with the list of all systems: https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/ledsoptsref/n0p1yuyzltd52jn1dubaao0dlk2m.htm
Bart
Assuming you've got a remote Windows server, a remote Linux server and a local machine: How do you plan to move the data? Directly server1->server2 or server1->local machine->server2?
If possible then server->server would be preferable and very likely with better network speed.
Instead of zipping your files you could also look into tools/protocols that allow for data compression on transit. What will work for you depends on what's available on your Windows server. I believe compression on transit exists for sftp, scp and rsync.
Another option would be a file system or a data lake that's accessible from both environments.
And last but not least: If you've got SAS connect on both machines and the ports are open then you could also transfer the data directly via SAS (run in batch on the Linux side with nohup). This way you wouldn't need to bother about creation of transport files.
BTW.
Direct googling of @Patrick 's sentence: "SAS connect on both machines and the ports are open then you could also transfer the data directly via SAS" provides this link: https://support.sas.com/resources/papers/proceedings/proceedings/sugi24/Advtutor/p43-24.pdf to paper about SAS/Connect and data transfer.
Bart
Hello @naz181
1.Compressing/ uncompressing large datasets/ transport files does take time and resources.
2.In my experience the optimal approach is to use ssh to move the transport files from Windows to Linux. You can also use FTP clients.
It may take time but the processes is time tested.
Working with SAS in multiple OS environment's I know of places where batch processes using ssh move large transport files from Linux to windows on a routine basis.
Window and Unix write compatible datasets. SAS can read a Window created dataset on a Unix machine and the reverse. But there is a performance hit. So you probably want to re-create the dataset on Unix once it is there.
Compressing the file will help with the transfer (although I think some transfer protocols can do that on the fly). GZIP can be used for a single file. ZIP can let you put multiple files into one "archive". You will need to reverse the process on the target machine once the file is copied there. SAS datasets (and also SAS CPORT files) will normally compress a LOT. I used to see the size reduced by 80-95 % back in the day.
The SAS Users Group for Administrators (SUGA) is open to all SAS administrators and architects who install, update, manage or maintain a SAS deployment.
SAS technical trainer Erin Winters shows you how to explore assets, create new data discovery agents, schedule data discovery agents, and much more.
Find more tutorials on the SAS Users YouTube channel.