BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
lalapopia
Fluorite | Level 6

Hello!

 

*Little background: I'm currently an intern who was asked to translate a coworker's SAS code to Python. This is the first time I ever read SAS code and my coworker just left for vacation. Unfortunately my company didn't install SAS in my PC, so I can't see the tables produced. 

 

My question: In the code below, there are 3 DO loops, each runs from 1 to a number. Each loop seems to create a column called vraidate and to update the values of a column called deltaEPS, however the values assigned to the rows of these columns are different in each loop! What happens to the values of these columns from the first two loops? They don't seem to be used for anything else in the code.

Does the program create a brand new column each time?

 

Finally, the second part of the code assigns one value of vraidate to the date. Which value is this?

The code:

data series2;set moustache;by country_name secteur;if last.Secteur=1;delaiFY1=intck('month',date,dateFY1);
if delaiFY1=. then delete;
 do i=1 to delaiFY1;
 vraidate=intnx('Month',date,i,'end');format vraidate date9.;
 if EPS_FY1_INDEX<=0 then deltaEPS=EPS_12M_TR;else deltaEPS=EPS_12M_TR+(i/delaiFY1)*(EPS_FY1_INDEX-EPS_12M_TR);
 output;
 end;
 do j=1 to 12;
 vraidate=intnx('Month',dateFY1,j,'end');format vraidate date9.;
 if EPS_FY2_INDEX<=0 then deltaEPS=EPS_FY1_INDEX;else deltaEPS=EPS_FY1_INDEX+(j/12)*(EPS_FY2_INDEX-EPS_FY1_INDEX);
 output;
 end;
 do k=1 to 12;
 vraidate=intnx('Month',dateFY2,k,'end');format vraidate date9.;
 if EPS_FY3_INDEX<=0 then deltaEPS=EPS_FY2_INDEX;else deltaEPS=EPS_FY2_INDEX+(k/12)*(EPS_FY3_INDEX-EPS_FY2_INDEX);
 output;
 end;run;
 
data series2;set series2;
date=vraidate;format date date9.;
ProjectionEPS=deltaEPS;
keep country_name Secteur date FY0_ENDDATE ProjectionEPS;
proc sort;by country_name Secteur date;run;
1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

@lalapopia wrote:

I don't really follow, isn't delaiFY1 just an integer difference between two dates? 

delaiFY1=intck('month',date,dateFY1);

So the first loop goes from 1 to delaiFY1, say it's 6. Then we create vraidate, but change it in the next loop, which goes from 1 to 12. 

What happens to vraidate in the second and third loops?


If you run code like this:

data test;
  do fred=1 to 5;
    output;
  end;
run;

You get a dataset with one variable named FRED and five observations where the value of FRED is different in each of them.

 

Your example code is doing the exact same thing, only with a more complicated dataset.

 

The value of VRAIDATE that SAS currently has in memory is changing as the lines of code in the data step execute.  But the OUTPUT statement writes a snapshot of the current values when it executes.

 

The variable VRAIDATE is "created" when SAS compiles the logic of the dataset.  So the variable is numeric and has a date format attached to it.  Its definition does not change while the step runs just because it is referenced in more than one statement.  It is the value of VRAIDATE that will vary.  That is why they are called variables.

View solution in original post

9 REPLIES 9
Tom
Super User Tom
Super User

The OUTPUT statement writes an observation to the output dataset when it runs.

So this loop:

do i=1 to delaiFY1;
  vraidate=intnx('Month',date,i,'end');
  if EPS_FY1_INDEX<=0 then deltaEPS=EPS_12M_TR;
  else deltaEPS=EPS_12M_TR+(i/delaiFY1)*(EPS_FY1_INDEX-EPS_12M_TR);
  output;
end;
format vraidate date9.;

will output DELIAFY1 observations to the output dataset. Each one will have a different value of VRAIDATE and possibly a different value of DELTAEPS.

lalapopia
Fluorite | Level 6

I don't really follow, isn't delaiFY1 just an integer difference between two dates? 

delaiFY1=intck('month',date,dateFY1);

So the first loop goes from 1 to delaiFY1, say it's 6. Then we create vraidate, but change it in the next loop, which goes from 1 to 12. 

What happens to vraidate in the second and third loops?

Tom
Super User Tom
Super User

@lalapopia wrote:

I don't really follow, isn't delaiFY1 just an integer difference between two dates? 

delaiFY1=intck('month',date,dateFY1);

So the first loop goes from 1 to delaiFY1, say it's 6. Then we create vraidate, but change it in the next loop, which goes from 1 to 12. 

What happens to vraidate in the second and third loops?


If you run code like this:

data test;
  do fred=1 to 5;
    output;
  end;
run;

You get a dataset with one variable named FRED and five observations where the value of FRED is different in each of them.

 

Your example code is doing the exact same thing, only with a more complicated dataset.

 

The value of VRAIDATE that SAS currently has in memory is changing as the lines of code in the data step execute.  But the OUTPUT statement writes a snapshot of the current values when it executes.

 

The variable VRAIDATE is "created" when SAS compiles the logic of the dataset.  So the variable is numeric and has a date format attached to it.  Its definition does not change while the step runs just because it is referenced in more than one statement.  It is the value of VRAIDATE that will vary.  That is why they are called variables.

lalapopia
Fluorite | Level 6

Oh, I think I understand now! Thanks a bunch!!

 

Reeza
Super User
Do you have the SAS data sets? If so, you can install either the SAS Universal Viewrer to use it and there's is a python package to read a SAS data sets.
lalapopia
Fluorite | Level 6
No, I only have the code in Notepad, no datasets
Reeza
Super User

For code conversion, comment every line with what you think is happening, then post it back. We can let you know where you're wrong/right.

 

Can we assume you understand that SAS goes through data line by line and loops are generally done on a single row of data, because a data step is an automatic loop? I'm also assuming you've ensured your data sets can fit into memory in Python? That's not how SAS processes data - not an in memory application, though CAS is in memory now.

Unasked for advice: Doing code conversion line by line seems simpler, but actually takes you longer in the long run - from doing many of these projects in various languages. Then take the process and redesign it using the most appropriate python methods, which is likely not looping given how python performs.

ballardw
Super User

If that is the code your coworker wrote then the first thing to do when he/she returns is to beat them severely about the head and shoulders for lousy code structures making it hard to read.

Reformatting the code for the first data step a bit:

data series2;
set moustache;
by country_name secteur;
if last.Secteur=1;
delaiFY1=intck('month',date,dateFY1);
if delaiFY1=. then delete;
 do i=1 to delaiFY1;
   vraidate=intnx('Month',date,i,'end');
   format vraidate date9.;
   if EPS_FY1_INDEX<=0 then deltaEPS=EPS_12M_TR;
   else deltaEPS=EPS_12M_TR+(i/delaiFY1)*(EPS_FY1_INDEX-EPS_12M_TR);
   output;
 end;
 do j=1 to 12;
   vraidate=intnx('Month',dateFY1,j,'end');
   format vraidate date9.;
   if EPS_FY2_INDEX<=0 then deltaEPS=EPS_FY1_INDEX;
   else deltaEPS=EPS_FY1_INDEX+(j/12)*(EPS_FY2_INDEX-EPS_FY1_INDEX);
   output;
 end;
 do k=1 to 12;
   vraidate=intnx('Month',dateFY2,k,'end');
   format vraidate date9.;
   if EPS_FY3_INDEX<=0 then deltaEPS=EPS_FY2_INDEX;
   else deltaEPS=EPS_FY2_INDEX+(k/12)*(EPS_FY3_INDEX-EPS_FY2_INDEX);
   output;
 end;
run;
 

The OUTPUT statements tell says to write the current data vector to the data set. So you get a different loop each writing to the same variable. The first line in each do loop is setting a common value of the date variable, I would guess for a report grouping value. The values of vraidate are three, I suppose, different end of period/ year values. Since the loop counters are used in the calculation of DELTAEPS then that appears to be the purpose: calculate 12 delta values for each "date". Why? That is your internal business logic.

 

The repeated use of variable names with components like 12M FY1, FY2 would suggest values associated with fiscal years or similar and may reflect translation of a process originally developed for spreadsheets that might benefit from redesign of the data.

 

A large number of follow up processes would not actually need what has been done in that second data set. The only thing I can see is that setting specific variable names may have a step using that data set with those variable names instead of VRAIDATE or DELTAEPS. I might suspect some macro code later that is written with hardcoded variable names.

 

lalapopia
Fluorite | Level 6

Thank you so much for this answer! This is helpful.

(yes, my coworker is particularly messy haha)

 

So what the loops do is group the data by country_name and Secteur, and then create 6  (usually delaiFY1 is 6 months) + 12 + 12 respective vraidate and deltaEPS values? (each is indeed for a fiscal year!) 

I think the program is supposed to do a linear regression...

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 9 replies
  • 1992 views
  • 2 likes
  • 4 in conversation