Hello,
I am new to SAS and am using the The Little SAS book to educate myself.
I have a question based on this example from Chapter 3 of the book:
How does SAS know to read each dataline and compute the AvgScore, DayEntered and Type for each of the datalines? I was expecting to see a loop to read through the data line by line.
Would like some clarity on how SAS reads this data in the pumpkin.dat file.
Does it read the line 1 then go through the list of variables and then go to line 2 and do the same so on and so forth?
Appreciate your help in understanding this.
The SAS DATA step is an implied loop. You don't have to write a loop, in fact, writing a loop is wrong.
So the data step reads the first line of data from pumpkin.dat, computes the values of avgscore, dayentered and type for that line of data. Then it knows to read the second line of data, then it computes the values of those three variables from the second line of data; and then keeps going like that on to the last line of data.
Thanks Paige, appreciate the prompt response.
So am I right in understanding that at the point of time SAS reads the first line, it is agnostic of what the second or other lines may be in the data file?
So it works more as an interpreter than a compiler? But I read there is compiling happening when we 'run' SAS code. Curious to know what happens in this 'compilation' vs. the interpretation.
Here is a good explanation from the documentation:
Wow this is amazing, thanks for pointing me to this Kathryn. 🙏
Hi @SASsusrik
SAS DATA step is compiled. From logical processing point of view it goes: "one input record at a time" (or when data set is read, "one observation at a time") is processed in Program Data Vector. Of course "physical" execution is more robust and data are read in groups for performance.
Let me share 3 very good articles about processing in DATA step:
1) Don Henderson - "SAS Supervisor": https://communities.sas.com/t5/SAS-Communities-Library/The-SAS-Supervisor/ta-p/429216
It explains how SAS thinking through DATA steps.
2) Paul Dorfman - "The Magnificent DO": https://pages.mini.pw.edu.pl/~jablonskib/SASpublic/WUSS2024_125/(Dorfman%202013)%20Paul%20M.%20Dorfm...
This one explains how SAS thinks about loops, it also discuss your doubt about "I would expect explicit looping to read data"
3) Ian Whitlock - "How to Think Through the SAS DATA Step": https://pages.mini.pw.edu.pl/~jablonskib/SASpublic/WUSS2024_125/(Whitlock%202006)%20Ian%20Whitlock,%...
One more very good explanation about "how it goes under the hood."
I really highly recommend all of them.
All the best
Bart
Also this video: https://www.youtube.com/watch?v=gonmrtJopWo
will give you a lot of good explanations.
Bart
And one more, a bit longer, video on DATA step processing: https://www.youtube.com/watch?v=7EMOai04_eg
Bart
Thanks Bart, for kindly sharing so many resources. I will go through them. Appreciate it!
The data step always loops until something stops it. So I would restate the question as:
How does SAS know when to STOP looping through the data?
What normally stops a data step is when it reads past the end of its inputs.
In your example when the INPUT statement finds there are no more lines to read from Pumpkin.dat.
In the case of a data step that reads in SAS datasets (with SET or MERGE statement for example) it normally stops when the SET or MERGE finds no more observations to read.
Other ways to stop a data step are :
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.