03-11-2025
Delgoffe_MCRI
Obsidian | Level 7
Member since
06-22-2017
- 18 Posts
- 3 Likes Given
- 0 Solutions
- 2 Likes Received
-
Latest posts by Delgoffe_MCRI
Subject Views Posted 806 03-11-2025 06:21 PM 847 03-11-2025 04:18 PM 867 03-11-2025 09:46 AM 1058 03-05-2025 12:10 PM 1084 03-05-2025 11:14 AM 1239 03-03-2025 02:40 PM 1299 03-03-2025 11:49 AM 2153 05-31-2024 10:06 AM 3263 11-08-2022 01:39 PM 1653 11-08-2022 01:33 PM -
Activity Feed for Delgoffe_MCRI
- Posted Re: Interacting with AWS Redshift: Exploring better ways to avoid slowness on SAS Programming. 03-11-2025 06:21 PM
- Liked Re: Interacting with AWS Redshift: Exploring better ways to avoid slowness for SASKiwi. 03-11-2025 05:23 PM
- Liked Re: Interacting with AWS Redshift: Exploring better ways to avoid slowness for SASKiwi. 03-11-2025 05:22 PM
- Got a Like for Re: Interacting with AWS Redshift: Exploring better ways to avoid slowness. 03-11-2025 05:14 PM
- Posted Re: Interacting with AWS Redshift: Exploring better ways to avoid slowness on SAS Programming. 03-11-2025 04:18 PM
- Posted Re: Interacting with AWS Redshift: Exploring better ways to avoid slowness on SAS Programming. 03-11-2025 09:46 AM
- Posted Re: Interacting with AWS Redshift: Exploring better ways to avoid slowness on SAS Programming. 03-05-2025 12:10 PM
- Posted Re: Interacting with AWS Redshift: Exploring better ways to avoid slowness on SAS Programming. 03-05-2025 11:14 AM
- Posted Re: Interacting with AWS Redshift: Exploring better ways to avoid slowness on SAS Programming. 03-03-2025 02:40 PM
- Tagged Interacting with AWS Redshift: Exploring better ways to avoid slowness on SAS Programming. 03-03-2025 11:53 AM
- Tagged Interacting with AWS Redshift: Exploring better ways to avoid slowness on SAS Programming. 03-03-2025 11:52 AM
- Tagged Interacting with AWS Redshift: Exploring better ways to avoid slowness on SAS Programming. 03-03-2025 11:52 AM
- Posted Interacting with AWS Redshift: Exploring better ways to avoid slowness on SAS Programming. 03-03-2025 11:49 AM
- Tagged Interacting with AWS Redshift: Exploring better ways to avoid slowness on SAS Programming. 03-03-2025 11:49 AM
- Tagged Interacting with AWS Redshift: Exploring better ways to avoid slowness on SAS Programming. 03-03-2025 11:49 AM
- Posted Re: Redcap export to SAS on Statistical Procedures. 05-31-2024 10:06 AM
- Got a Like for Re: Using API Playground in REDCap to import data into SAS. 09-26-2023 05:03 PM
- Posted Re: Redcap export to SAS on Statistical Procedures. 11-08-2022 01:39 PM
- Tagged Re: Redcap export to SAS on Statistical Procedures. 11-08-2022 01:39 PM
- Posted Re: Difficulties with REDCap data in SAS on SAS Data Management. 11-08-2022 01:33 PM
-
Posts I Liked
Subject Likes Author Latest Post 1 1 1 -
My Liked Posts
Subject Likes Posted 1 03-11-2025 09:46 AM 1 11-08-2022 01:31 PM -
My Library Contributions
Subject Likes Author Latest Post 0
03-11-2025
06:21 PM
I did see that, but I was able to confirm the credentials, but using them through the other process of CSVs. This sounds like an issue with locks, but I haven't been able to resolve it using the options for the locking methods either.
... View more
03-11-2025
04:18 PM
@yabwon Its seems I'm getting closer to a solution, but I've encountered an error.
I was able to determine that running the following:
%sysexec(aws sso login --profile &sso_profile.);
will give you an authentication code and have you log in using a UI with that OTC, then dump a file into the .aws cache directory (C:\Users\[username]\.aws\sso\cache\) containing temporary token and secret information.
I was NOT able to find a way to "point" SAS to that information for its use, but it is a JSON file, so you can read it in like this:
filename cred_in "C:\Users\[username]\.aws\sso\cache\[filename].json";
libname cred_lib JSON fileref=cred_in;
and get data that looks like this:
which can be read into macrovariables, and used like this:
data _NULL_;
set cred_lib.root;
call symput ("SECRET",strip(clientSecret));
call symput ("TOKEN",strip(accessToken));
call symput ("REGION",strip(region));
run;
libname AWSWORKB
redshift
server="&server."
database=&database.
port=5439
conopts="encryptionmethod=6;validateservercertificate=0;"
schema="&schema."
user=&awsdwid.
password=&awsdwpwd.
dbcommit = 0 insertbuff = 10000 readbuff=10000
bulkunload=yes
bulkload=yes
bl_bucket="&bucket."
bl_token="&TOKEN."
bl_secret="&SECRET."
;
When I go to use that information/libname I get this in the log:
REDSHIFT: AUTOCOMMIT is YES for connection 7 REDSHIFT_34: Prepared: on connection 7 SELECT TOP 0 * FROM "AWS_SCHEMA".DELGOFFB_BULKTEST2 Summary Statistics for REDSHIFT are: Total SQL prepare seconds were: 0.026091 Total seconds used by the REDSHIFT ACCESS engine were 0.026230 REDSHIFT: AUTOCOMMIT is YES for connection 9 NOTE: SAS variable labels, formats, and lengths are not written to DBMS tables. REDSHIFT_35: Executed: on connection 9 CREATE TABLE "AWS_SCHEMA".DELGOFFB_BULKTEST2 (dept_num NUMERIC(11),dept_name VARCHAR(30),dept_specialty_group_desc VARCHAR(60),appt_desk_id INTEGER,mapped_to_dept_num NUMERIC(11),active_status_code VARCHAR(1),dept_type_code VARCHAR(1),company_name VARCHAR(180),mc_facility_code VARCHAR(3),facility_num NUMERIC(11),aou_dept_group VARCHAR(17)) REDSHIFT: Bulkload seconds used for setup: 0.029634 REDSHIFT: Bulkload seconds used for file open: 0.003500 NOTE: There were 722 observations read from the data set WORK.TEST_DATA. NOTE: The data set AWSWORKB.DELGOFFB_BULKTEST2 has 722 observations and 11 variables. REDSHIFT: Bulkload seconds used for writes: 0.007592 REDSHIFT: Bulkload seconds used for file close: 0.005716 REDSHIFT: Bulkload seconds used for file delete: 0.472694 REDSHIFT: Bulkload total seconds used: 0.519328 Summary Statistics for REDSHIFT are: Total SQL execution seconds were: 0.773501 Total SQL prepare seconds were: 0.026091 Total seconds used by the REDSHIFT ACCESS engine were 1.321699 ERROR: Message from TKS3: The file is already locked by another user NOTE: DATA statement used (Total process time): real time 1.85 seconds cpu time 0.26 seconds
and open the table to find all the variables but no rows, when I run this code:
options sastrace=',,,ds' sastraceloc=saslog nostsuffix;
data AWSWORKB.DELGOFFB_BULKTEST2
(bulkload=yes
bl_bucket="&bucket.\upload"
bl_token="&TOKEN."
bl_secret="&SECRET."
bl_region='us-east-2'
bl_use_ssl=yes
);
set WORK.TEST_DATA;
run;
Have you or anyone else seen this error? It looks like its doing some work with the bulk load, so I don't think its an access issue - but there's no other "Users" that would have that open. I even tried changing the names, but I think its seeing its own shadow (like a groundhog).
... View more
03-11-2025
09:46 AM
1 Like
This was a pretty good increase in performance!
By adding these options to the existing libname:
dbcommit = 0
insertbuff = 10000
I'm seeing 29 seconds for 315K records instead of 14 minutes with same syntax when compared to previous libname (without those options).
PROC SQL FEEDBACK;
INSERT INTO OLDAWSCONNECT.BED281_&ReqNum._MHNs
SELECT DISTINCT
VAR1
,VAR2
FROM SASLIBNAME.DATASET
;
QUIT;
Then I see 27 seconds when using this syntax to create the blank table, create the SAS table, and then load records in:
PROC SQL FEEDBACK;
&CONNECT_INFO.;
EXECUTE (DROP TABLE IF EXISTS NEWAWS.BED281_&REQNUM._MHNs)&CONNECT_BY. ;
EXECUTE (
CREATE TABLE NEWAWS.BED281_&ReqNum._MHNs
("VAR1" INTEGER NOT NULL
,"VAR2" INTEGER NOT NULL
)
DISTKEY (VAR1)
) &CONNECT_BY. ;
CREATE TABLE SASLIBNAME.DATASET AS
SELECT DISTINCT
VAR1
,VAR2
FROM SASLIBNAME.DATA_WITH_MANYMORE_COLUMNS
;
QUIT; proc datasets library = NEWAWS nolist; append base = BED281_&ReqNum._MHNs data = SASLIBNAME.DATASET; run; quit;
So it seems that the DBCOMMIT and INSERTBUFF options on the libname have helped a lot, so even if I don't get the S3 bin options figured out, this could be a good option.
... View more
- Tags:
- DBCOMMIT
- INSERTBUFF
03-05-2025
12:10 PM
These tips worked well for increasing the speed and success of the CSV middle man - but it was already sufficiently fast. I've made these adjustments to our CSV process, but I'd like to not use a CSV at all. We're just trying to "move"/"load" data currently in SAS to be on Redshift or "query"/"pull" from Redshift into SAS, but ideally without any hard coding or unique consideration for the contents of the dataset.
I'd like the coding required to do this to be as simple as one of these, but run as fast as this CSV via S3 process.
data AWSLIBNAME.AWS_TABLE_NAME;
set SASLIBNAME.SAS_TABLE_NAME;
run;
PROC SQL;
create table AWSLIBNAME.AWS_TABLE_NAME as
SELECT * FROM SASLIBNAME.SAS_TABLE_NAME;
QUIT;
Related to "1) Don't bother to include the header row, since you already know what you are sending.", in the case of the CSV process this is true and a helpful tip. In practice what we are sending changes each time and may not be "known" each time - so I suspect you mean know based on creating the table first after looking?
It is a downside that we have to create the table in AWS ahead of time to use the CSV and right now we're using metadata mining to help do that part programmatically, but I was hoping that if we removed the CSV middle man, we could also remove the need for "knowing what we're sending" step too. I'm hoping SAS would make those data type decisions, but maybe I'm misunderstanding.
Related to "embedded end of line characters", I'm still having spill over issues that look to be caused by line feeds and data type mismatches:
ERROR: CLI execute error: [SAS][ODBC Redshift Wire Protocol driver][Redshift]ERROR: Load into table 'delgoffb_bulktest' failed.
Check 'stl_load_errors' system table for details.(File /home/ec2-user/padb/src/pg/src/backend/commands/commands_copy.c; Line
769; Routine CheckMaxRowError; )
When running:
PROC SQL;
&RDWWORKCONNECT;
CREATE TABLE LAST_RUN_LOADERRORS AS
SELECT DISTINCT * &RDWFROMCONNECT (
select * from stl_load_errors
)
&RDWCLOSE
having max(starttime) = starttime /*Select Only Those from Most Recent Starttime*/
order by starttime desc
;QUIT;
The error is:
" CR ham and beef" is not an acceptable value to the field 'fake_identifier' (which is defined as integer)
You see this in the interim CSV:
fake_identifier,rdw_date,wonky_field 800231,12/30/2024,tab 800230,12/30/2024,", CR ham and beef",,
To avoid this I usually parse the data using code like this in a data step (but that all increases run time) to remove anything that could potentially cause an issue or doesn't need to be there in the final value (like HTML or plain text tags):
[more_code]
%if &CHAR_CLEAN. = Y %then %do;
*The default cleaning to convert the delimiter to
something else in the data should be performed unless
you are confident it does not exist in all text fields;
array clean_fancy _CHAR_;
do over Clean_fancy;
%if &FREE_TEXT.=Y %then %do;
*if there are not fields that have the opportunity
to contain these issues, dont run this extra code
as it will increase run-time unecessarily.;
*remove plain text;
CLEAN_FANCY = tranwrd(CLEAN_FANCY,'<','<');
CLEAN_FANCY = tranwrd(CLEAN_FANCY,'>','>');
CLEAN_FANCY = tranwrd(CLEAN_FANCY,'&','<p>');
CLEAN_FANCY = tranwrd(CLEAN_FANCY,'nbsp;','<p>');
*remove any html tags;
retain rx1;
if _n_=1 then rx1=prxparse("s/<.*?>//");
call prxchange(rx1,-1,Clean_Fancy);
drop rx1;
*strip blanks;
CLEAN_FANCY=compress(CLEAN_FANCY,'0A090B0D'x);
CLEAN_FANCY=strip(CLEAN_FANCY);
%end;
*Translate the delimiter in text fields to something else to avoid upload issues;
CLEAN_FANCY=translate(CLEAN_FANCY,"'","`");
end; *end array;
%end;
[more code]
It works, but just in much more lines of code, data changing, and processing time.
... View more
03-05-2025
11:14 AM
@yabwon I received feedback that we are not allowed to create the long term access keys, but found some documentation from AWS on alternatives:
https://docs.aws.amazon.com/IAM/latest/UserGuide/security-creds-programmatic-access.html#security-creds-alternatives-to-long-term-access-keys
I'm aware that we have IAM roles set-up, and I see some other options in the BL_KEY = documentation (https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/acreldb/n0i0qwjjbfu090n1mi1xxwsc2mbf.htm) that seem to reference these temporary tokens, but I'm still not sure how to programmatically achieve getting those tokens for use here. It's unclear what set of options, with what set of values are required or go together, but it seems like using tokens should only be one option. I suspect a different option or set of options that use the IAM role instead of tokens would be needed, but I don't see this.
... View more
03-03-2025
02:40 PM
@yabwon
I have looked through the syntax and tried to come up with something that worked with these options before, but not had success. If there is guidance on where to find the values for different settings, that may be where things went wrong. I'm not familiar enough with how all the configuration works with these either. There may be steps I missed.
Just in case, I did try this again - but still got this secret key error.
data &libname..delgoffb_bulktest(
bulkload=yes
bl_bucket=&bucket.
bl_use_ssl=yes
bl_region='us-east-2');
set sashelp.class;
run;
ERROR: Message from TKS3: No secret key specified.
But I'm not aware of any "secret key" process we use.
... View more
03-03-2025
11:49 AM
NOTE: I am using what look like macro variables as placeholders for information that may be changed or related to security of data throughout this post. We do not use macro variables in all these cases in the live code, so some macrovariables may not be using appropriate masking functions, quotes, etc. Read these as synonymous to [Add this information] type placeholders.
We are using a libname statement to connect to an AWS Redshift Environment, where we pull and push data.
/*Connect to S3*/
%sysexec(aws sso login --profile &sso_profile.);
/*Connect to AWS Redshift using Libname*/
libname &library_name.
redshift
server="&secretinformation..us-east-2.redshift.amazonaws.com"
database=&database.
port=5439
conopts="encryptionmethod=6;validateservercertificate=0;"
schema="&schema_name."
user=&awsdwid.
password=&awsdwpwd.;
We commonly use PROC SQL for a variety functions either via the libname or pass thru, like so:
/*Run a Query on AWS using Pass Thru*/
PROC SQL;
CONNECT USING &LIBNAME. AS QSELECT;
EXECUTE
(drop table if exists &table_name.)
BY QSELECT;
QUIT;
/*Run a Query in SAS using Libname*/
proc sql;
drop table &libname..&tablename.;
quit;
But they run very slow (ex. 3hrs on 3.1M records) using the libname method when doing SELECT-FROM and INSERT-INTO.
Right now, we are using the following method (with prep and meta data parsing code on either side) to "speed up" this conversation using S3 buckets (from 3hrs to 5mins on that 3.1M records). It DOES run faster, but it requires a CSV middle man that has two data integrity issues in my book: 1) strips metadata and causes more processing to be needed to mock this in AWS and 2) Needing to pre-process CSV files to remove or translate characters that cause reading errors - this I do not like.
/* export sas dataset as a CSV */
proc export
data=sasdata_for_export
outfile="&csvdir.&csvname..csv"
dbms=dlm
replace;
delimiter='`'; *Use the back tick to avoid issues with using a comma delimiter (free text fields cause line feed issues with that delimiter);
putnames=NO; *do not add names here - use a create table statement in AWS to create the table ahead of time;
run;
/* load the CSV to s3 using the cp (copy) command*/
%sysexec(aws s3 cp "&csvdir.&csvname..csv" "s3://&bucket./upload/&csvname..csv" --profile &sso_profile.);
/* copy the CSV from s3 to redshift */
proc sql;
CONNECT USING &LIBNAME. AS QSELECT; execute (copy res_work.%unquote("&rstblname.") from %unquote(%bquote('s3://&bucket./upload/&csvname..csv')) iam_role &iam_role. delimiter '`' dateformat 'auto' acceptinvchars as ' ' maxerror 10 ROUNDEC; ) BY QSELECT; QUIT;
I have tried to use PROC S3, but couldn't come up with a syntax that worked.
Can anyone provide code that would use either PROC S3, or some other mechanism to avoid the CSV middle man but retain the speeds of the S3 buckets?
Anyone who answers with something that works would also be invited to co-author a Users Group paper/presentation on this.
I am programming using the following:
--- SAS Software --- SAS Enterprise Guide version: 8.5 (8.5.0.203) (64-bit)
SAS Base Version: 9.4 (TS1M7) SAS System version: 9412.0.24363.19622
Analytical products:
SAS/STAT 15.2 SAS/ETS 15.2 SAS/IML 15.2
--- Operating System --- Name: Windows 10 Enterprise Version: 10.0 Default web browser: Google Chrome
... View more
05-31-2024
10:06 AM
@sassy_seb Check out my export macro. You can get the latest version using my REDCap survey in the previous comments or reading my paper(s) on LexJansen.com by searching "REDCap Your SAS Friend". The most recent one should be listed first. To answer your question more pointedly - I've circumvented those issues by first downloading the data dictionary via API and using it to define features instead of the code REDCap provides as an export option. This is even more important with the use of style characters in values now. I will also be presenting new content on REDCap at the SESUG Conference in Bethesda September 22-24th (2024). Registration is still open - as is call for content. https://sesug.org/SESUG2024/index.php
... View more
11-08-2022
01:39 PM
Yes, it uses regular expressions to remove HTML, carriage returns, and completely parses the REDCap data dictionary to provide labels and formats to the SAS dataset it creates. At times you may need to control how it interprets the type of variable it makes based on existing data (ESPECIALLY with missing data codes), but the next version under development will go through and replace text missing data codes with numeric acceptable values before import to avoid that...so stay tuned!
... View more
11-08-2022
01:33 PM
I have recently published a macro to export REDCap data into SAS along with formats and a data dictionary. Please see my paper, presentation, and code by filling out this survey: https://redcap.link/REDCapIsMySASFriend. Distribution using this method allows me to distribute only the most recent version, so you can check back to this survey in the future as well. Also available on LexJansen under SESUG 2022: "REDCap: Your SAS Friend for EHR Manual Abstraction"
... View more
11-08-2022
01:31 PM
1 Like
I have recently published a macro to export REDCap data into SAS along with formats and a data dictionary. Please see my paper, presentation, and code by filling out this survey: https://redcap.link/REDCapIsMySASFriend. Distribution using this method allows me to distribute only the most recent version, so you can check back to this survey in the future as well. Also available in LexJansen under SESUG 2022: "REDCap: Your SAS Friend for EHR Manual Abstraction"
... View more
11-08-2022
01:25 PM
I have recently published a macro to export REDCap data into SAS along with formats and a data dictionary. Please see my paper, presentation, and code by filling out this survey: https://redcap.link/REDCapIsMySASFriend. Distribution using this method allows me to distribute only the most recent version, so you can check back to this survey in the future as well. Also available on LexJansen under SESUG 2022. "REDCap: Your SAS Friend for EHR Manual Abstraction"
... View more
11-08-2022
01:21 PM
I have recently published a macro to export REDCap data into SAS along with formats and a data dictionary. There is also a token cloaking macro that will utilize external files to pull in your token and set options to prevent disclosure. Please see my paper, presentation, and code by filling out this survey: https://redcap.link/REDCapIsMySASFriend. You can also find them on lexjansen under SESUG 2022. Distribution using this method allows me to distribute only the most recent version, so you can check back to this survey in the future as well.
... View more
11-08-2022
01:14 PM
I have recently published a macro to export REDCap data into SAS along with formats and a data dictionary. Please see my paper, presentation, and code by filling out this survey: https://redcap.link/REDCapIsMySASFriend. Distribution using this method allows me to distribute only the most recent version, so you can check back to this survey in the future as well.
... View more
06-27-2018
09:16 AM
Hello ballardw, Thank you for your comments! The mix in units was an oversight that I hadn’t caught. I’ve fixed that to show inches only and also changed the hpos= to 0. Unfortunately the output has not changed. The titles display correctly in the gridded layout (appropriate size, centered on column, appearing in center of page without being cut off), but the graphs do not. The bottom of the page has been cut off in the picture because the output correctly appears in the first row of my gridded layout and nothing comes below it. Here is the full page after changes: The problem is that the first map seems to be centered on the page instead of the first column, and the second map is cut off by the page ending and is therefore only half showing. I'm using long and lat coordinates in my mapping file so I'm not sure how to change the shape files to display differently if that's what's needed. Ideally I would like each map to appear in its respective column, centered on the column and column heading. If there is a better way to code this while still using the by statement, I would be open to that as well. Thanks again!
... View more