BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Syntas_error
Quartz | Level 8

This is from another SAS practice questoin. 

 

We have a raw data file:

 

1001 77,164.19 76,804.75 74,384.27

1002 76,612.93 81,456.34 82,063.97

1003 82,185.16 79,742.33

 

We want to transform the raw data file into the following data set:

 

Store              Sales                   Month

1001               77,164.19            1

1001               76,804.75            2

1001               74,384.27            3

1002               76,612.93            1

1002               81,456.34            2

1002               82,063.97            3

1003               82,185.16            1

1003               79,742.33            2

 

There are several alternatives, to choose from, the correct one being:

 

data perm.topstores;
infile sales98 missover;
input Store Sales : comma. @;
Month=0;
do while (sales ne .);
month + 1;
output;
input sales : comma. @;
end;
run;

 

Now, could someone walk me through what's happening in the different steps?

 

I think I understand the virst relevant line: 

 

input Store Sales : comma. @;
 
This simply specifies the variables, the ':' alows us to use an informat for 'sales', so we can process those non-standard values. The @-sign should "lock the row into place" allowing multiple observations to be read from the same row. Though in my understanding, this would mean values from each row should be assigned to Store "and" Sales, so for the first row we would get something like:
 

Store              Sales                  

1001               77,164.19

76,804.75       74,384.27
 
The next segment of code is hard to follow: 
 
do while (sales ne .);
month + 1;
output;
input sales : comma. @;
end;
 
 
What exactly does "do while (sales ne .);", mean, do we execute the steps as long as the values of sales aren't missing?
 
How exacly does this step specify that the counter is to be reset for each new store?
 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

You've picked a good problem to deepen your understanding of how the DATA step works.  Here's where your logic breaks down.

 

The INPUT statement is more than a roadmap on how to read the incoming lines of data.  It's actually a set of instructions, which execute repeatedly (for each line of data.  So for the first INPUT statement, and the first line of data, this is all you get:

 

Store   Sales

1001    77164.19

 

Those values just sit in memory, awaiting further developments.  Next, the software adds MONTH, with a value of 0.  Then the DO loop begins, since SALES > . is true.  It increases MONTH, giving you:

 

Store   Sales     Month

1001    77164.19    1

 

The OUTPUT statement executes next, copying these values from memory to the output data set.  (Technically, I'm skipping a little bit of technical detail here, not important for our purposes.)

 

Then the second INPUT statement executes, giving you:

 

Store   Sales     Month

1001    76804.75    1

 

Again, those values are just sitting in memory.  The DO loop continues, increasing MONTH to 2.  Then the OUTPUT statement executes again, copying the current values from memory to the output data set.

 

The DO loop again checks the value of SALES and finds that it is greater than missing.  So the DO loop executes again.

 

In a nutshell, those are the next principles to absorb, to increase your understanding of the DATA step.

View solution in original post

5 REPLIES 5
Kurt_Bremser
Super User

The missover option tells SAS to return a missing value when you try to read more items than are present on the current line.

The @ holds the input line.

So the do while keeps reading from the input line until the end is reached (input returns a missing), at which point it goes to the next data step iteration, which causes an automatic skip to the next input line.

Astounding
PROC Star

You've picked a good problem to deepen your understanding of how the DATA step works.  Here's where your logic breaks down.

 

The INPUT statement is more than a roadmap on how to read the incoming lines of data.  It's actually a set of instructions, which execute repeatedly (for each line of data.  So for the first INPUT statement, and the first line of data, this is all you get:

 

Store   Sales

1001    77164.19

 

Those values just sit in memory, awaiting further developments.  Next, the software adds MONTH, with a value of 0.  Then the DO loop begins, since SALES > . is true.  It increases MONTH, giving you:

 

Store   Sales     Month

1001    77164.19    1

 

The OUTPUT statement executes next, copying these values from memory to the output data set.  (Technically, I'm skipping a little bit of technical detail here, not important for our purposes.)

 

Then the second INPUT statement executes, giving you:

 

Store   Sales     Month

1001    76804.75    1

 

Again, those values are just sitting in memory.  The DO loop continues, increasing MONTH to 2.  Then the OUTPUT statement executes again, copying the current values from memory to the output data set.

 

The DO loop again checks the value of SALES and finds that it is greater than missing.  So the DO loop executes again.

 

In a nutshell, those are the next principles to absorb, to increase your understanding of the DATA step.

Syntas_error
Quartz | Level 8
Okay, for that to be true a coule of things need to follow I think:

1. The inputed values needs to remain in the memory after they've been outputed.

2. the variable "sales" needs to be overwritten by the new variable with the same name.

3. Despite (2), the second input statement picks a new value (one step to the right) for the new variable.

Do I interpret this correctly?
Tom
Super User Tom
Super User

Pretty much. 

 

Values for the current observation are always "in memory" during the execution of a data step.  That is why you can use the variables in operations. 

 

Variables are not really "overwritten".  They have values. The values vary across observations of the dataset and can vary during the execution of the code. Remember that SAS will compile your data step code before it starts running it. So the INPUT statement itself doesn't define a variable. I just reads values into the variable.  It might appear to define the variable because allows lazy programming.  You don't have to define your variables before using them. SAS will make a guess at how to define the variable based on how you first use it in the code.

 

SAS keeps a pointer to where on the line you are currently reading.  (You can use the COLUMN= option on the INFILE statement to have it share that value as a variable in your data step.)  So when you again read SALES it starts looking from where it left off after the last read (because of the trialing @ on the input statement).

Astounding
PROC Star

You interpret correctly.  While this is an oversimplification, it helps to think of there being three storage locations:

 

  • The raw data line that the program reads from
  • The values in memory
  • The values written to the output data set

 

So to address your questions ...

 

The OUTPUT statement copies from values in memory to the output data set.  It has no impact on the values in memory.

 

"Overwriting SALES" means changing the value of SALES in memory.  In this case, that is accomplished by executing an INPUT statement that reads from the raw data line, and replaces the value in memory.  So it's not a new variable with the same name ... it's the value in memory being replaced by a new value.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 5 replies
  • 737 views
  • 3 likes
  • 4 in conversation