- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I use SAS Enterprise Guide 8.3, with Linux server and Windows SAS EG client. We recently moved from Netezza to Snowflake/Azure. This program expands the membership date spans into member months and has worked after we moved to the cloud however 2 months later (no changes to data) the program churns and I have to kill it. Does Snowflake not like the syntax for the do while program? Any ideas why this did work after move to Snowflake and now 1 month later it churns?
SAMPLE DATA
data member_spans;
infile datalines delimiter= ',';
input patient_id $ startElig:date9. endElig:date9. transfer;
format startElig endElig date9. transfer dollar12.2;
datalines;
P12345678,01JAN2023,31OCT2023, 4000
Q23456789,01JAN2023,30JUN2023,461.43
R34567891,01JAN2023,31MAR2023,544.07
S45678912,01JAN2023,31AUG2023,3678.69
S45678912,01SEP2023,31OCT2023,923.46
T56789123,01OCT2023,31OCT2023,18930
U67891234,01JAN2023,31MAR2023,-1269.85
U67891234,01APR2023,31OCT2023,-3019.41
V78912345,01JAN2023,18JUL2023,-3196.49
V78912345,19JUL2023,14OCT2023,-1413.52
V78912345,15OCT2023,31OCT2023,-273.07
run;
/*Expand into mms*/
data expand_2_mms;
set Member_Spans ;
by patient_id startElig endElig ;
counter=1;
if transfer ne .;
do while(startElig <= endElig);
output;
startElig=intnx('month',startElig,1);
counter=counter+1;
if startElig >= endElig then leave;
end;
format startElig date9.;
run;
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
DATA step code does not run in Snowflake, so the techniques you use for DATA step processing aren't an issue. Here are some considerations:
1. You really shouldn't use a subsetting IF statement when reading database data in SAS if you can get the same result using a WHERE statement. To improve the efficiency of your code, replace:
if transfer ne .;
with:
where transfer is not null;
This will minimize the amount of data pulled into SAS from Snowflake for processing.
2. Is it possible you have some bad records in your data where the startElig value is missing? If so, your code will go into an infinite loop.
You can avoid this situation by adding code to avoid reading records with missing startElig values:
where transfer is not null and startElig is not null;
or add code to impute a value for a missing startElig before entering the loop:
startElig=colaesce(startElig, endElig-1);
do while(startElig <= endElig);
*... more statement s ...;
end;
3. Is the dataset you are reading growing in size? If so, expect longer run times as the size of the data grows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
You don't show any connection to Snowflake so I am guessing that perhaps Snowflake stores dates differently than Netezza and your connection with Snowflake isn't doing the date conversion "correctly".
Perhaps you could share exactly how you connect to Snowflake and the code submitted with some LOG entry to provide more details that someone familiar with Snowflake can diagnose.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
DATA step code does not run in Snowflake, so the techniques you use for DATA step processing aren't an issue. Here are some considerations:
1. You really shouldn't use a subsetting IF statement when reading database data in SAS if you can get the same result using a WHERE statement. To improve the efficiency of your code, replace:
if transfer ne .;
with:
where transfer is not null;
This will minimize the amount of data pulled into SAS from Snowflake for processing.
2. Is it possible you have some bad records in your data where the startElig value is missing? If so, your code will go into an infinite loop.
You can avoid this situation by adding code to avoid reading records with missing startElig values:
where transfer is not null and startElig is not null;
or add code to impute a value for a missing startElig before entering the loop:
startElig=colaesce(startElig, endElig-1);
do while(startElig <= endElig);
*... more statement s ...;
end;
3. Is the dataset you are reading growing in size? If so, expect longer run times as the size of the data grows.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Jedi! You are spot on.
I do a validation of the data prior to this data step so I knew that wasn't the issue.
I implemented the 'where variables are not nulll) for both transfer and startElig and it ran successfully in .26 seconds.
Very much appreciate your prompt reply and assistance!
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Can you share the log from running the step (and after waiting a while, killing it)? There should be some helpful clues there.
To detect the potential infinite loop problem, you could add something like:
counter=1;
do while(startElig <= endElig);
output;
startElig=intnx('month',startElig,1);
if startElig >= endElig then leave;
counter=counter+1;
if counter > 1000 then do;
put "ERROR: infinite loop!" patientID= startElig= endElig=;
leave;
end;
end;
I agree with SAS Jedi's suggestion that a missing value for StartElig would be one easy way for your current code to trigger an infinite loop.
Next up: Rick Wicklin presents Ten Tips for Effective Statistical Graphics (with SAS code) on Wednesday March 26.
Register now at https://www.basug.org/events.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for your prompt response and suggestion to identify an infinite loop Quentin.
Will definitely use this in the future!