BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
bhaskarkothavt
Obsidian | Level 7

Hi everyone,

I'm reaching out to the SAS community today seeking some guidance. Our team is currently in the process of transitioning from base SAS to SAS Studio and migrating our data storage from a local file server to Amazon S3.

We've successfully uploaded all our historical datasets (in .sas7bdat format) from the local server to S3. However, we're encountering an issue when trying to load/read this data directly from S3 into CAS using SAS Studio.


While I can perform various S3 bucket management tasks like creating, deleting, uploading data from CAS, and copying data between buckets, I'm unable to utilize PROC S3 to directly load data from S3 into CAS using .sas7bdat files.

proc s3  keyid= &accesskey secret=&scaccesskey region=&regi;  
    get "/viya-data/AAI/Pertape/datasets/pt31mar24.sas7bdat"    "/nfsshare/data/sasdata/S3Transit/pt31mar24"; 
run;

bhaskarkothavt_1-1715001385626.png

 

While using PROC S3 if a file size smaller than 2MB then i am getting different error than above one ". I cross checked with my sas admin that we have read and write permissions.

bhaskarkothavt_2-1715001521674.png

 



Here's the interesting part: I can successfully use PROC CASUTIL's LOAD function to read data from S3 in formats like .sashdat, CSV, or Excel. It's specifically the .sas7bdat format that seems to be causing the issue.


caslib amz dataSource=(srcType="s3"  accesskeyid=&accesskey secretaccesskey=&scaccesskey region=&regi 
                  bucket="viya-data"  objectpath="/AAI/Pertape/datasets/");
proc casutil incaslib=amz outcaslib="S3Transit";
   load casdata="pt31mar24.sas7bdat"  casout="tl_website_30apr24";
quit;

Error message while using PROC CASUTIL : 

bhaskarkothavt_0-1715001229232.png

 


I'm wondering if anyone in the community has faced similar challenges when working with .sas7bdat files in S3 and CAS. If so, any insights or solutions you could share would be greatly appreciated.

We are currently using SAS Viya LTS 2023.03

Thank you in advance for your time and assistance!
-Bhaskar

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

Going forward please create a new question for new requests. It keeps the Q&A leaner and cleaner and though makes it easier to find answers searching the forum.

 

The parquet libname engine is only available under Viya. You need first to copy your .sas7bdat files to the Viya side.

Below sample code demonstrates how to convert a .sas7bdat file to the Parquet file format. The libname parquet engine has imho still some quirks (like when it comes to deletion of the files) which I've already raised with SAS Tech Support. Right now it just needs some extra coding for some operations.

Spoiler
options fullstimer msglevel=i;

/* create directories for demo */
%let sv_dlcreatedir=%sysfunc(getoption(dlcreatedir));
options dlcreatedir;
libname _dir "&_userhome/demo";
libname _dir "&_userhome/demo/source";
libname _dir "&_userhome/demo/target";
libname _dir clear;
options &sv_dlcreatedir;

/* assign source library with .sas7bdat files */
libname source "&_userhome/demo/source";

/* create sample source data */
data source.class(index=(name sex));
  set sashelp.class;
run;
data source.cars;
  set sashelp.cars;
run;

/* assign target library for parquet files */
libname target parquet "&_userhome/demo/target";

/* copy data from source to target */
proc datasets lib=source nodetails;
  copy
    in=source
    out=target
    ;
quit;

/** housekeeping: remove demo folder and all its content **/
%macro housekeeping();
  /* delete SAS files in source */
  proc datasets lib=source kill;
  quit;
  /* delete source folder */
  filename source "%sysfunc(pathname(source))";
  data _null_;
    rc=fdelete("source");
  run;
  filename source clear;
  libname source clear;
  
  /* delete parquet files in target */
  proc sql;
    create table work.file_list as
    select l.libname, lowcase(l.engine) as engine, lowcase(r.memname) as memname, l.path
    from dictionary.libnames l 
    inner join dictionary.tables r 
    on l.libname='TARGET' and l.libname=r.libname
    ;
  quit;
  
  data _null_;
    set work.file_list;
    fname='tempfile';
    length fully_qualified_path $2000;
    fully_qualified_path=cats(path,'/',memname,'.',engine);
    rc=filename(fname, fully_qualified_path);
    if rc = 0 and fexist(fname) then
       rc=fdelete(fname);
    rc=filename(fname);
  run;
  /* delete target folder */
  filename target "%sysfunc(pathname(target))";
  data _null_;
    rc=fdelete("target");
  run;
  filename target clear;
  libname target clear;

  /* delete demo folder */
  filename demo "&_userhome/demo";
  data _null_;
    rc=fdelete('demo');
  run;
  filename demo clear;
%mend;
/* %housekeeping(); */

If you run above code as-is you should see the following in SAS Studio:

Patrick_0-1715138204818.png

 

Please note

  • Parquet files are immutable meaning Write Once. You can't insert, delete or update a parquet file. You can only fully replace it.
  • Also consult the docu. It appears there are libname options allowing you to read/write parquet files directly from/to S3. STORAGE_PLATFORM= LIBNAME Statement Option

In my mind a data flow under Viya could look like:

  • Upstream source data (any format but could be parquet).
  • Data manipulation ("ETL") mainly under Compute (your "SAS9.4" under Viya). 
  • Intermediary data still .sas7bdat. Endpoint of process for load into CAS or for downstream non-SAS applications potentially parquet and potentially written to S3 (only if table doesn't require further change later on).
  • Load into CAS for analysis and reporting

I also suggest you search the SAS Communities with keyword "parquet". There are many useful Library Articles available.

 

Also read this discussion. It appears that currently when using the parquet engine with the compute libname statement a single file gets created but when using the cas caslib statement then multiple files get created under a folder with the name of the parquet file.

The libname and caslib can read each others structures but I guess this could create some challenges when it comes to managing the output under S3. From how I understand things right now one best sticks to a design where a specific set of parquet files only ever gets created either under Compute with a libname or under CAS with a caslib.

I'm not sure but I assume that creating the parquet file in chunks will better support multi-threading for reading the data into CAS (or writing from CAS to parquet).  

 

Here and example how the parquet file looks like when created via caslib/proc casutil. You reference such a structure in code still as <caslib>.class_casutil.

Patrick_0-1715140519061.png

 

 

View solution in original post

6 REPLIES 6
Patrick
Opal | Level 21

Mostly based on documentation:

  • The caslib statement with the S3 engine doesn't support .sas7bdat files. Path-Based Data Source Types and Options
    • I guess what this means for .sas7bdat: Use Proc S3 to copy the file from the S3 bucket to the file system and then use a caslib that points to this path for server side load via Proc Casutil (or the table.addTable action).
  • Based on the  docu for Proc S3 the target for the Get statement needs to be the fully qualified path AND file name (in your screenshot it's only the path).

What I'd consider as part of a migration to Viya and S3:

Convert your source .sas7bdat files to the Parquet format. Not only should this save space and  improve performance for data movement between S3 and SAS/CAS but it is also a format that's widely used and directly accessible by many other applications (for downstream processing). 

 

 

bhaskarkothavt
Obsidian | Level 7

Hello Patrick,

Thank you very much for taking the time to respond to my question. I truly appreciate your help and the suggestion of converting base SAS (.sas7bdat) files to Parquet format. It seems like a promising solution for working with my data in CAS.

 

While I'm exploring the conversion process, I came across the term "fully qualified path." I'd be grateful if you could provide a brief explanation of what a fully qualified path is or an example about it.

 

Your insights will be very helpful as I navigate this transition.

Thanks again for your assistance!

Sincerely,

Bhaskar

Tom
Super User Tom
Super User

A fully qualified path is the full name of a FILE or DIRECTORY.  If you are running on Unix then the name starts with the root node (a single /).  If you are running on Windows then it will start with a drive letter and a colon then the slash( c:\....).  Or for direct access to a network share without first assigning a drive letter to it a double slash and the hostname and sharename (\\hostname\sharename\....) .

 

The PROC S3 documentation does not seem to have a GET example but it does have a PUT example.  

put "/u/marti/project/licj.csv" "/myBucket/licj.csv";

The GET statement wants the S3 location first and a local filename second, so just reverse the order.

get "/myBucket/licj.csv" "/u/marti/project/licj.csv" ;

 

bhaskarkothavt
Obsidian | Level 7
Patrick - Could you please help me with the code for converting .sas7bdat files to Parquet format using base sas 9.4.

Best Regards,
Bhaskar
Patrick
Opal | Level 21

Going forward please create a new question for new requests. It keeps the Q&A leaner and cleaner and though makes it easier to find answers searching the forum.

 

The parquet libname engine is only available under Viya. You need first to copy your .sas7bdat files to the Viya side.

Below sample code demonstrates how to convert a .sas7bdat file to the Parquet file format. The libname parquet engine has imho still some quirks (like when it comes to deletion of the files) which I've already raised with SAS Tech Support. Right now it just needs some extra coding for some operations.

Spoiler
options fullstimer msglevel=i;

/* create directories for demo */
%let sv_dlcreatedir=%sysfunc(getoption(dlcreatedir));
options dlcreatedir;
libname _dir "&_userhome/demo";
libname _dir "&_userhome/demo/source";
libname _dir "&_userhome/demo/target";
libname _dir clear;
options &sv_dlcreatedir;

/* assign source library with .sas7bdat files */
libname source "&_userhome/demo/source";

/* create sample source data */
data source.class(index=(name sex));
  set sashelp.class;
run;
data source.cars;
  set sashelp.cars;
run;

/* assign target library for parquet files */
libname target parquet "&_userhome/demo/target";

/* copy data from source to target */
proc datasets lib=source nodetails;
  copy
    in=source
    out=target
    ;
quit;

/** housekeeping: remove demo folder and all its content **/
%macro housekeeping();
  /* delete SAS files in source */
  proc datasets lib=source kill;
  quit;
  /* delete source folder */
  filename source "%sysfunc(pathname(source))";
  data _null_;
    rc=fdelete("source");
  run;
  filename source clear;
  libname source clear;
  
  /* delete parquet files in target */
  proc sql;
    create table work.file_list as
    select l.libname, lowcase(l.engine) as engine, lowcase(r.memname) as memname, l.path
    from dictionary.libnames l 
    inner join dictionary.tables r 
    on l.libname='TARGET' and l.libname=r.libname
    ;
  quit;
  
  data _null_;
    set work.file_list;
    fname='tempfile';
    length fully_qualified_path $2000;
    fully_qualified_path=cats(path,'/',memname,'.',engine);
    rc=filename(fname, fully_qualified_path);
    if rc = 0 and fexist(fname) then
       rc=fdelete(fname);
    rc=filename(fname);
  run;
  /* delete target folder */
  filename target "%sysfunc(pathname(target))";
  data _null_;
    rc=fdelete("target");
  run;
  filename target clear;
  libname target clear;

  /* delete demo folder */
  filename demo "&_userhome/demo";
  data _null_;
    rc=fdelete('demo');
  run;
  filename demo clear;
%mend;
/* %housekeeping(); */

If you run above code as-is you should see the following in SAS Studio:

Patrick_0-1715138204818.png

 

Please note

  • Parquet files are immutable meaning Write Once. You can't insert, delete or update a parquet file. You can only fully replace it.
  • Also consult the docu. It appears there are libname options allowing you to read/write parquet files directly from/to S3. STORAGE_PLATFORM= LIBNAME Statement Option

In my mind a data flow under Viya could look like:

  • Upstream source data (any format but could be parquet).
  • Data manipulation ("ETL") mainly under Compute (your "SAS9.4" under Viya). 
  • Intermediary data still .sas7bdat. Endpoint of process for load into CAS or for downstream non-SAS applications potentially parquet and potentially written to S3 (only if table doesn't require further change later on).
  • Load into CAS for analysis and reporting

I also suggest you search the SAS Communities with keyword "parquet". There are many useful Library Articles available.

 

Also read this discussion. It appears that currently when using the parquet engine with the compute libname statement a single file gets created but when using the cas caslib statement then multiple files get created under a folder with the name of the parquet file.

The libname and caslib can read each others structures but I guess this could create some challenges when it comes to managing the output under S3. From how I understand things right now one best sticks to a design where a specific set of parquet files only ever gets created either under Compute with a libname or under CAS with a caslib.

I'm not sure but I assume that creating the parquet file in chunks will better support multi-threading for reading the data into CAS (or writing from CAS to parquet).  

 

Here and example how the parquet file looks like when created via caslib/proc casutil. You reference such a structure in code still as <caslib>.class_casutil.

Patrick_0-1715140519061.png

 

 

bhaskarkothavt
Obsidian | Level 7
My apologies I will keep in mind and will new request if it something new and thanks for detailed explanation much appreciated.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 500 views
  • 4 likes
  • 3 in conversation