BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
teli4
Fluorite | Level 6

Hi everyone,

This post is a follow-up of the previous one with more detail -honestly, I'm not entirely sure what happened to the previous question. The problematic piece of code is this one:

_________

data test1;
input visit $16.;
datalines;
First
Second
;

 

data nulll;
input index;
datalines;
.
;

 

data test2;
stop;
set nulll;
run;

 

data test;
set test2 test1;
if index(VISIT, 'Fir')^=0 then do;
index=100;
end;

 

proc print data=test;
run;

__________

The output variable "index" contains two values of 100 whereas the logic implies that only one should be there.

 

On the other hand if we split the last data step in two it works properly (that is, firstly set test1 and test2 to get test and then set test to do the procedures) but I'm afraid to post the full code as it may get flagged as spam.

 

Can someone please explain what the issue is - I believe it's something to do with setting the datasets and performing operations on the data in the same step.

 

Thanks in advance,

teli4

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Variables from incoming datasets are automatically retained. index is in dataset test2, so it is retained. With the first observation from test1, index is set to 100, and keeps that value, since it is never overwritten.

View solution in original post

14 REPLIES 14
ballardw
Super User

Can you describe why all the messing around with a 0 observation data set is necessary.

teli4
Fluorite | Level 6
Hello there! The names of the given variables and the ones that should be obtained have to be taken from an external file and are fixed - the way this dummy dataset works is it ensures that the variables are properly named. Of course, the code above is simplified as much as I could and is just there to give a brief overview of the problem. Also, thanks for taking the time to consider this question!
ballardw
Super User

@teli4 wrote:
Hello there! The names of the given variables and the ones that should be obtained have to be taken from an external file and are fixed - the way this dummy dataset works is it ensures that the variables are properly named. Of course, the code above is simplified as much as I could and is just there to give a brief overview of the problem. Also, thanks for taking the time to consider this question!

This sounds to me like you have a process that is flawed at the start. Why would variables that should be there not be in the data set? Or possibly be named incorrectly?

 

You might also consider MERGE instead of SET for such a marginal purpose.

Kurt_Bremser
Super User

Variables from incoming datasets are automatically retained. index is in dataset test2, so it is retained. With the first observation from test1, index is set to 100, and keeps that value, since it is never overwritten.

teli4
Fluorite | Level 6

Thank You very much for taking the time to answer! However, as I am at a beginner level, I'm a bit confused - with the set procedure I ensure that all variable from both data sets will be included (I'm assuming that's what you meant with retaining the variable index). However, shouldn't the logical operator "if" ensure that the condition is checked for each row, that is, for each observation of the variable 'visit'? That would imply that one of those logical checks would fail and one will not, so the value will be overwritten, no?

Kurt_Bremser
Super User

Your IF does not have an ELSE branch, so only the the action following the THEN branch will be executed, once. Since the resulting variable is retained, the value does not change thereafter.

teli4
Fluorite | Level 6

Thanks! So that means that the "if" operator does not check each row (i.e. each observation of the particular variable) if an "else" statement is not provided but rather uses only the outcome of the first true logical result and plugs it all the way down?

Kurt_Bremser
Super User

The IF statement checks its condition in EVERY observation. Since only a THEN branch is supplied, only that action can be executed. The result of that execution (which happens once) is kept because of the automatic retain, caused by the resulting variable being present in an incoming dataset.

teli4
Fluorite | Level 6

But why does that happen when the 2 datasets (test1 and test2) are set AND the if condition is used altogether in the same data step but the whole things works properly when we split the above code by first setting test1 and test2 in one dataset and then applying the IF condition in the new data set (that is, use 2 steps)? What I mean is why does the code

_________________________________

data test1;
input visit $16.;
datalines;
First
Second
;

 

data nulll;
input index;
datalines;
.
;

 

data test2;
stop;
set nulll;
run;

 

data test;
set test2 test1;

 

/*Here is the difference - I've split the data step in the code into 2 data sets */

 

data test;
set test;
if index(VISIT, 'Fir')^=0 then do;
index=100;
end;

 

proc print data=test;
run;

 

work as intended?

Kurt_Bremser
Super User

When you set both datasets, and use the condition, the following happens:

  • all variables are created in the PDV
  • nothing is read into index, because the first dataset is empty, and the second dataset does not have the variable
  • therefore, any value set once by the condition is retained

If you do it in separate steps, in the second step

  • the variables are created as above
  • but now index IS present in the incoming dataset (with missing values)
  • therefore, the contents are overwritten with every observation read
teli4
Fluorite | Level 6

Thank You very much, much appreciated. One last question before I accept this as the as the solution - would you say that the second approach is better (i.e. setting the datasets and then using IF) when not using an ELSE statement?

 

Once again, thanks! I'm sorry if I took too much out of your time.

Kurt_Bremser
Super User

If you want something to happen when a condition is not met, then you have to use the ELSE. Don't rely on "it's going to happen on its own". See Maxim 31.

 

Whether you combine steps for efficiency, or keep them separate for coding clarity, is up to you.

teli4
Fluorite | Level 6
Will have a look at the list for sure - I, however, assumed that if you don't enter an ELSE statement then the IF operator just evaluates the false statement as false, does nothing and moves on with the evaluation of the next row
Kurt_Bremser
Super User

Before you wonder about the effects of a missing ELSE, you need to get familiar with the workings of the DATA step as such. When are variables set to missing, and which variables will be set to missing at that moment.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 14 replies
  • 2039 views
  • 7 likes
  • 3 in conversation