I use SAS Enterprise Guide 8.3, with Linux server and Windows SAS EG client. We recently moved from Netezza to Snowflake/Azure. This program expands the membership date spans into member months and has worked after we moved to the cloud however 2 months later (no changes to data) the program churns and I have to kill it. Does Snowflake not like the syntax for the do while program? Any ideas why this did work after move to Snowflake and now 1 month later it churns?
SAMPLE DATA
data member_spans;
infile datalines delimiter= ',';
input patient_id $ startElig:date9. endElig:date9. transfer;
format startElig endElig date9. transfer dollar12.2;
datalines;
P12345678,01JAN2023,31OCT2023, 4000
Q23456789,01JAN2023,30JUN2023,461.43
R34567891,01JAN2023,31MAR2023,544.07
S45678912,01JAN2023,31AUG2023,3678.69
S45678912,01SEP2023,31OCT2023,923.46
T56789123,01OCT2023,31OCT2023,18930
U67891234,01JAN2023,31MAR2023,-1269.85
U67891234,01APR2023,31OCT2023,-3019.41
V78912345,01JAN2023,18JUL2023,-3196.49
V78912345,19JUL2023,14OCT2023,-1413.52
V78912345,15OCT2023,31OCT2023,-273.07
run;
/*Expand into mms*/
data expand_2_mms;
set Member_Spans ;
by patient_id startElig endElig ;
counter=1;
if transfer ne .;
do while(startElig <= endElig);
output;
startElig=intnx('month',startElig,1);
counter=counter+1;
if startElig >= endElig then leave;
end;
format startElig date9.;
run;
DATA step code does not run in Snowflake, so the techniques you use for DATA step processing aren't an issue. Here are some considerations:
1. You really shouldn't use a subsetting IF statement when reading database data in SAS if you can get the same result using a WHERE statement. To improve the efficiency of your code, replace:
if transfer ne .;
with:
where transfer is not null;
This will minimize the amount of data pulled into SAS from Snowflake for processing.
2. Is it possible you have some bad records in your data where the startElig value is missing? If so, your code will go into an infinite loop.
You can avoid this situation by adding code to avoid reading records with missing startElig values:
where transfer is not null and startElig is not null;
or add code to impute a value for a missing startElig before entering the loop:
startElig=colaesce(startElig, endElig-1);
do while(startElig <= endElig);
*... more statement s ...;
end;
3. Is the dataset you are reading growing in size? If so, expect longer run times as the size of the data grows.
You don't show any connection to Snowflake so I am guessing that perhaps Snowflake stores dates differently than Netezza and your connection with Snowflake isn't doing the date conversion "correctly".
Perhaps you could share exactly how you connect to Snowflake and the code submitted with some LOG entry to provide more details that someone familiar with Snowflake can diagnose.
DATA step code does not run in Snowflake, so the techniques you use for DATA step processing aren't an issue. Here are some considerations:
1. You really shouldn't use a subsetting IF statement when reading database data in SAS if you can get the same result using a WHERE statement. To improve the efficiency of your code, replace:
if transfer ne .;
with:
where transfer is not null;
This will minimize the amount of data pulled into SAS from Snowflake for processing.
2. Is it possible you have some bad records in your data where the startElig value is missing? If so, your code will go into an infinite loop.
You can avoid this situation by adding code to avoid reading records with missing startElig values:
where transfer is not null and startElig is not null;
or add code to impute a value for a missing startElig before entering the loop:
startElig=colaesce(startElig, endElig-1);
do while(startElig <= endElig);
*... more statement s ...;
end;
3. Is the dataset you are reading growing in size? If so, expect longer run times as the size of the data grows.
Thanks Jedi! You are spot on.
I do a validation of the data prior to this data step so I knew that wasn't the issue.
I implemented the 'where variables are not nulll) for both transfer and startElig and it ran successfully in .26 seconds.
Very much appreciate your prompt reply and assistance!
Can you share the log from running the step (and after waiting a while, killing it)? There should be some helpful clues there.
To detect the potential infinite loop problem, you could add something like:
counter=1;
do while(startElig <= endElig);
output;
startElig=intnx('month',startElig,1);
if startElig >= endElig then leave;
counter=counter+1;
if counter > 1000 then do;
put "ERROR: infinite loop!" patientID= startElig= endElig=;
leave;
end;
end;
I agree with SAS Jedi's suggestion that a missing value for StartElig would be one easy way for your current code to trigger an infinite loop.
Thanks for your prompt response and suggestion to identify an infinite loop Quentin.
Will definitely use this in the future!
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.