I have this SAS code using first. and recursive if else loops which I need to convert to an SQL query. But I am not able to understand how to do it because of the retain statement and calling numerous variables in multiple loops.
Can someone please help me to convert it to SQL query?
data new_data; set raw_data(where=(appr_date ne .)); by loan_number period_dt; retain prev_appr_date prev_upd_appr_date appraisal_cross; if first.loan_number then do; if appr_date > period_dt then do; prev_appr_date=appr_date; upd_appr_date=period_dt; prev_upd_appr_date=upd_appr_date; appraisal_cross=1; x=1; end; else if appr_date <= period_dt then do; prev_appr_date=appr_date; upd_appr_date=appr_date; prev_upd_appr_date=appr_date; appraisal_cross=0; x=2; end; end; else do; if appr_date<prev_upd_appr_date and appr_date ne prev_appr_date then do; upd_appr_date=max(period_dt,prev_upd_appr_date); appraisal_cross=1; x=3; end; else if appr_date=prev_appr_date and appraisal_cross=1 then do; upd_appr_date=prev_upd_appr_date; appraisal_cross=1; x=4; end; else if appr_date>=prev_upd_appr_date and appr_date>period_dt then do; upd_appr_date=period_dt; appraisal_cross=1; x=5; end; else if appr_date>=prev_upd_appr_date and appr_date <=period_dt then do; upd_appr_date=appr_date; appraisal_cross=0; x=6; end; prev_upd_appr_date=upd_appr_date; prev_appr_date=appr_date; end; run;
Thank you in advance. Hope to get some help on this.
Some things in a data step have no exact equivalent in PROC SQL. In particular, first.loan_number has no SQL equivalent, and there may be others things in your code that also don't translate.
SQL is not the right tool to replicate this algorithm since it does not have the concept of FIRST or LAST. It just operates on SETs.
Can you explain the purpose of the code? Or is your actual question to understand what the code is doing?
PS There are no loops in that code, other than the implied loop over all of the observations being read from RAW_DATA.
@rahulsaha2127 wrote:
I have to convert this code to Pyspark. Since I am unable to do that I was trying to convert it to sql, and then pass the sql query in Pyspark
So please explain in words what the code is doing. And perhaps better WHY it is doing that as part of the larger problem you are trying to solve.
Use panda's instead or a method in pyspark that supports looping through each row.
Or break the process down to it's steps and redesign it entirely using a SQL methodology which may involve multiple steps and reads.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.