BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
plevcek
Fluorite | Level 6

Hi,

 

I really don't understand why those 2 datasteps are acting differently.

I am using SAS 9.4.

It seems that when I declare variables using dataset with OBS=0 it declares all variables as RETAIN variables.

 

Let's say we have 2 datasets:

 

data d1;
infile datalines;
input ID Name1$30.;
datalines;
1 john
3 mike
4 tony
2 isaac
;
data d2;
infile datalines;
input ID Name2$30.;
datalines;
9 marc
2 blake
;

 

And then I have 2 options of declaring variables from DATA2.

 

DATA new;
SET d1;
FORMAT name2 $30.;
	
	IF _N_ = 2 THEN DO;
		name2 = 'new name';
	END;
RUN;

DATA new2;
SET d1 d2(obs=0);
	
	IF _N_ = 2 THEN DO;
		name2 = 'new name2';
	END;
RUN;

 

I get following results:

Dataset NEW:

ID Name1 name2
1 john
3 mike new name
4 tony
2 isaac

 

But dataset NEW2:

ID Name2 name2

1 john
3 mike new name2
4 tony new name2
2 isaac new name2

 

My question is WHY is second option (using obs=0) of declaration of variables acting as if I would define variables as RETAIN?

Is that a SAS feature or am I missing something here?

 

The reason I am using obs=0 is sometimes I want to add many variables and instead of listing all of them this seems to be a fast way but I do not want them to be as a RETAIN variables.

 

Any explanation would be much appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

All variables that come from datasets are retained (which really means they are not set to missing when a new iteration starts).  The values of the variables are only changed when the next observation is read. That is how SAS is able to do a one many merge.  

 

Note that when the dataset does not contribute to a new BY group then the variables are set to missing.

 

So you might try finding a common variable you can merge "by".  But you will still see the value being retained within the BY group if the other dataset(s) contribute multiple observations for the same BY group.

 

Or just add you own code to clear the variables.  An easy way is to add these two statements to the end of the data step.

output;
call missing(of _all_);

If the idea was just to add the variables then you might do something like this instead.

data want ;
  if 0 then set for_variable_defintions ;
  call missing(of _all_);
  set real_dataset ;
  retain newvar 0 ;
run;

When you use the _ALL_ variable list it will include only the variables that the SAS compiler has already seen. So in the code above that means only the variables defined in the set statement that will never execute because it is the THEN clause of an IF statement that is always false.  So in particular the variable NEWVAR that does want to be retained is not yet defined and so is not set to missing by the call missing() function call.

 

View solution in original post

5 REPLIES 5
SuryaKiran
Meteorite | Level 14

Even though you set OBS=0  descriptor information for that data set is created. When you submit a SAS Data step it is first compiled and then executed. During the compilation PDV and Descriptor portion are set. 

 

For more information how data step is processed got through this: https://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000985872.htm

Thanks,
Suryakiran
ChrisNZ
Tourmaline | Level 20

In NEW2, since you declared name2 as part of a data set, name2 will only get refreshed when a value is read from a dataset.

 

Since this does not happen, there is no refresh.

plevcek
Fluorite | Level 6

The refresh part makes sense but looking at the functionality using SET with 2 datasets seems a wrong way to go in my case.

 

E.g. If we look only at the variable 'Name2' then observations that belong to Data1 will have name2 acting as a retain variable and observations of Data2 will be refreshed for each observation.

So If i would say

IF id=2 THEN name2 = 'Tim';

Then it could happen that observations from Data1 which have ID different than 2 could have value of name2 equal to 'Tim'. (would need to change the order in Data1 and put ID=2 observation at the beginning).

 

I did another test and used the

BY id;

option (and sort it prior of course) and then it does not retain values anymore.

 

I have to admit I am a bit confused about the rules here. I would expect it to refresh the whole observation including the variables that are not in the current observation.

 

 

Tom
Super User Tom
Super User

All variables that come from datasets are retained (which really means they are not set to missing when a new iteration starts).  The values of the variables are only changed when the next observation is read. That is how SAS is able to do a one many merge.  

 

Note that when the dataset does not contribute to a new BY group then the variables are set to missing.

 

So you might try finding a common variable you can merge "by".  But you will still see the value being retained within the BY group if the other dataset(s) contribute multiple observations for the same BY group.

 

Or just add you own code to clear the variables.  An easy way is to add these two statements to the end of the data step.

output;
call missing(of _all_);

If the idea was just to add the variables then you might do something like this instead.

data want ;
  if 0 then set for_variable_defintions ;
  call missing(of _all_);
  set real_dataset ;
  retain newvar 0 ;
run;

When you use the _ALL_ variable list it will include only the variables that the SAS compiler has already seen. So in the code above that means only the variables defined in the set statement that will never execute because it is the THEN clause of an IF statement that is always false.  So in particular the variable NEWVAR that does want to be retained is not yet defined and so is not set to missing by the call missing() function call.

 

plevcek
Fluorite | Level 6
Thank you. Very helpful explanation.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 949 views
  • 0 likes
  • 4 in conversation