Hi everyone,
I'm reaching out to the SAS community today seeking some guidance. Our team is currently in the process of transitioning from base SAS to SAS Studio and migrating our data storage from a local file server to Amazon S3.
We've successfully uploaded all our historical datasets (in .sas7bdat format) from the local server to S3. However, we're encountering an issue when trying to load/read this data directly from S3 into CAS using SAS Studio.
While I can perform various S3 bucket management tasks like creating, deleting, uploading data from CAS, and copying data between buckets, I'm unable to utilize PROC S3 to directly load data from S3 into CAS using .sas7bdat files.
While using PROC S3 if a file size smaller than 2MB then i am getting different error than above one ". I cross checked with my sas admin that we have read and write permissions.
Here's the interesting part: I can successfully use PROC CASUTIL's LOAD function to read data from S3 in formats like .sashdat, CSV, or Excel. It's specifically the .sas7bdat format that seems to be causing the issue.
Error message while using PROC CASUTIL :
I'm wondering if anyone in the community has faced similar challenges when working with .sas7bdat files in S3 and CAS. If so, any insights or solutions you could share would be greatly appreciated.
We are currently using SAS Viya LTS 2023.03
Thank you in advance for your time and assistance!
-Bhaskar
Going forward please create a new question for new requests. It keeps the Q&A leaner and cleaner and though makes it easier to find answers searching the forum.
The parquet libname engine is only available under Viya. You need first to copy your .sas7bdat files to the Viya side.
Below sample code demonstrates how to convert a .sas7bdat file to the Parquet file format. The libname parquet engine has imho still some quirks (like when it comes to deletion of the files) which I've already raised with SAS Tech Support. Right now it just needs some extra coding for some operations.
options fullstimer msglevel=i;
/* create directories for demo */
%let sv_dlcreatedir=%sysfunc(getoption(dlcreatedir));
options dlcreatedir;
libname _dir "&_userhome/demo";
libname _dir "&_userhome/demo/source";
libname _dir "&_userhome/demo/target";
libname _dir clear;
options &sv_dlcreatedir;
/* assign source library with .sas7bdat files */
libname source "&_userhome/demo/source";
/* create sample source data */
data source.class(index=(name sex));
set sashelp.class;
run;
data source.cars;
set sashelp.cars;
run;
/* assign target library for parquet files */
libname target parquet "&_userhome/demo/target";
/* copy data from source to target */
proc datasets lib=source nodetails;
copy
in=source
out=target
;
quit;
/** housekeeping: remove demo folder and all its content **/
%macro housekeeping();
/* delete SAS files in source */
proc datasets lib=source kill;
quit;
/* delete source folder */
filename source "%sysfunc(pathname(source))";
data _null_;
rc=fdelete("source");
run;
filename source clear;
libname source clear;
/* delete parquet files in target */
proc sql;
create table work.file_list as
select l.libname, lowcase(l.engine) as engine, lowcase(r.memname) as memname, l.path
from dictionary.libnames l
inner join dictionary.tables r
on l.libname='TARGET' and l.libname=r.libname
;
quit;
data _null_;
set work.file_list;
fname='tempfile';
length fully_qualified_path $2000;
fully_qualified_path=cats(path,'/',memname,'.',engine);
rc=filename(fname, fully_qualified_path);
if rc = 0 and fexist(fname) then
rc=fdelete(fname);
rc=filename(fname);
run;
/* delete target folder */
filename target "%sysfunc(pathname(target))";
data _null_;
rc=fdelete("target");
run;
filename target clear;
libname target clear;
/* delete demo folder */
filename demo "&_userhome/demo";
data _null_;
rc=fdelete('demo');
run;
filename demo clear;
%mend;
/* %housekeeping(); */
If you run above code as-is you should see the following in SAS Studio:
Please note
In my mind a data flow under Viya could look like:
I also suggest you search the SAS Communities with keyword "parquet". There are many useful Library Articles available.
Also read this discussion. It appears that currently when using the parquet engine with the compute libname statement a single file gets created but when using the cas caslib statement then multiple files get created under a folder with the name of the parquet file.
The libname and caslib can read each others structures but I guess this could create some challenges when it comes to managing the output under S3. From how I understand things right now one best sticks to a design where a specific set of parquet files only ever gets created either under Compute with a libname or under CAS with a caslib.
I'm not sure but I assume that creating the parquet file in chunks will better support multi-threading for reading the data into CAS (or writing from CAS to parquet).
Here and example how the parquet file looks like when created via caslib/proc casutil. You reference such a structure in code still as <caslib>.class_casutil.
Mostly based on documentation:
What I'd consider as part of a migration to Viya and S3:
Convert your source .sas7bdat files to the Parquet format. Not only should this save space and improve performance for data movement between S3 and SAS/CAS but it is also a format that's widely used and directly accessible by many other applications (for downstream processing).
Hello Patrick,
Thank you very much for taking the time to respond to my question. I truly appreciate your help and the suggestion of converting base SAS (.sas7bdat) files to Parquet format. It seems like a promising solution for working with my data in CAS.
While I'm exploring the conversion process, I came across the term "fully qualified path." I'd be grateful if you could provide a brief explanation of what a fully qualified path is or an example about it.
Your insights will be very helpful as I navigate this transition.
Thanks again for your assistance!
Sincerely,
Bhaskar
A fully qualified path is the full name of a FILE or DIRECTORY. If you are running on Unix then the name starts with the root node (a single /). If you are running on Windows then it will start with a drive letter and a colon then the slash( c:\....). Or for direct access to a network share without first assigning a drive letter to it a double slash and the hostname and sharename (\\hostname\sharename\....) .
The PROC S3 documentation does not seem to have a GET example but it does have a PUT example.
put "/u/marti/project/licj.csv" "/myBucket/licj.csv";
The GET statement wants the S3 location first and a local filename second, so just reverse the order.
get "/myBucket/licj.csv" "/u/marti/project/licj.csv" ;
Going forward please create a new question for new requests. It keeps the Q&A leaner and cleaner and though makes it easier to find answers searching the forum.
The parquet libname engine is only available under Viya. You need first to copy your .sas7bdat files to the Viya side.
Below sample code demonstrates how to convert a .sas7bdat file to the Parquet file format. The libname parquet engine has imho still some quirks (like when it comes to deletion of the files) which I've already raised with SAS Tech Support. Right now it just needs some extra coding for some operations.
options fullstimer msglevel=i;
/* create directories for demo */
%let sv_dlcreatedir=%sysfunc(getoption(dlcreatedir));
options dlcreatedir;
libname _dir "&_userhome/demo";
libname _dir "&_userhome/demo/source";
libname _dir "&_userhome/demo/target";
libname _dir clear;
options &sv_dlcreatedir;
/* assign source library with .sas7bdat files */
libname source "&_userhome/demo/source";
/* create sample source data */
data source.class(index=(name sex));
set sashelp.class;
run;
data source.cars;
set sashelp.cars;
run;
/* assign target library for parquet files */
libname target parquet "&_userhome/demo/target";
/* copy data from source to target */
proc datasets lib=source nodetails;
copy
in=source
out=target
;
quit;
/** housekeeping: remove demo folder and all its content **/
%macro housekeeping();
/* delete SAS files in source */
proc datasets lib=source kill;
quit;
/* delete source folder */
filename source "%sysfunc(pathname(source))";
data _null_;
rc=fdelete("source");
run;
filename source clear;
libname source clear;
/* delete parquet files in target */
proc sql;
create table work.file_list as
select l.libname, lowcase(l.engine) as engine, lowcase(r.memname) as memname, l.path
from dictionary.libnames l
inner join dictionary.tables r
on l.libname='TARGET' and l.libname=r.libname
;
quit;
data _null_;
set work.file_list;
fname='tempfile';
length fully_qualified_path $2000;
fully_qualified_path=cats(path,'/',memname,'.',engine);
rc=filename(fname, fully_qualified_path);
if rc = 0 and fexist(fname) then
rc=fdelete(fname);
rc=filename(fname);
run;
/* delete target folder */
filename target "%sysfunc(pathname(target))";
data _null_;
rc=fdelete("target");
run;
filename target clear;
libname target clear;
/* delete demo folder */
filename demo "&_userhome/demo";
data _null_;
rc=fdelete('demo');
run;
filename demo clear;
%mend;
/* %housekeeping(); */
If you run above code as-is you should see the following in SAS Studio:
Please note
In my mind a data flow under Viya could look like:
I also suggest you search the SAS Communities with keyword "parquet". There are many useful Library Articles available.
Also read this discussion. It appears that currently when using the parquet engine with the compute libname statement a single file gets created but when using the cas caslib statement then multiple files get created under a folder with the name of the parquet file.
The libname and caslib can read each others structures but I guess this could create some challenges when it comes to managing the output under S3. From how I understand things right now one best sticks to a design where a specific set of parquet files only ever gets created either under Compute with a libname or under CAS with a caslib.
I'm not sure but I assume that creating the parquet file in chunks will better support multi-threading for reading the data into CAS (or writing from CAS to parquet).
Here and example how the parquet file looks like when created via caslib/proc casutil. You reference such a structure in code still as <caslib>.class_casutil.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.