BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
shengnian
Fluorite | Level 6

data aaa;
aa = 1; b = 3; output;
run;

data ccc;
aa = 2; output;
aa = 3; output;
aa = 3; output;
aa = 3; output;
aa = 4; output;
aa = 5; output;
run;

data aab;
put _all_;
set aaa ccc;
/* by aa;*/
if aa = 3 then do;
b = 1;
b1 = 2;
end;
put _all_;
run;

 

As above, when I comment the by statement, b is retained as 1  when aa = 4, 5. However when I uncomment the by statement, the value of b becomes missing. I wonder what happended when by xxx is used with set statement?

 

By the way, if the set statement is replaced by merge statement, no matter whether commenting the by statement or not, the value of b never become 1 when aa = 4,5

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

Consider an abbreviated version of your example:

 

data combined;

set aaa ccc;

by aa;

run;

 

AAA contains both AA and B. CCC contains AA only.

 

As the DATA step processes the observations, it alternates reading observations from AAA and CCC.  As part of that process, whenever it switches from one data set to the other, it reinitializes B to missing.  After that, if the next observation comes from AAA, it replaces B.  If the next observation comes from CCC, it does not replace B.

View solution in original post

4 REPLIES 4
Astounding
PROC Star

You're looking at the effects of a few features.

 

Variables that come from a SAS data set are automatically retained.  That includes B, since it comes from AAA.  Without a BY statement, you set B to 1 and nothing replaces B for the rest of the DATA step.  So it remains 1 from that point forward. The software has to decide when to set variables to missing when they are brought in from a SAS data set, and does so whenever it switches from one data set to another.

 

You might be interested to compare that to what happens if you make a slight change to your program:

 

if aa=4 then do;

 

With a BY statement, the software has an additional function to perform.  Should it ever re-set retained variables to a missing value?  The answer depends on whether you use SET or MERGE.  With SET + BY, the software re-sets retained variables to missing when it begins reading observations from a new data set.  With MERGE + BY, the software re-sets retained variables to missing when it begins a new value of a BY variable.

shengnian
Fluorite | Level 6

With SET + BY, the software re-sets retained variables to missing when it begins reading observations from a new data set.

 

Can you explain more about the new data set?  Very grateful.

Astounding
PROC Star

Consider an abbreviated version of your example:

 

data combined;

set aaa ccc;

by aa;

run;

 

AAA contains both AA and B. CCC contains AA only.

 

As the DATA step processes the observations, it alternates reading observations from AAA and CCC.  As part of that process, whenever it switches from one data set to the other, it reinitializes B to missing.  After that, if the next observation comes from AAA, it replaces B.  If the next observation comes from CCC, it does not replace B.

shengnian
Fluorite | Level 6

You answered my question perfectly ! Thanks, Astounding.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 867 views
  • 1 like
  • 2 in conversation