BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
msauer
Obsidian | Level 7

We have encountered a strange behaviour in the data step using a SET statement with multiple datasets.

 

The situation is as follows:

data ds1;
  x = 1;
  y = 1;
  output;
  x = 2;
  y = .;
  output;
  x = 3;
  y = 3;
  output;
run;

data ds2;
  x = 4;
  output;
  x = 5;
  output;
  x = 6;
  output;
run;

ds1 has two columns x and y whereas ds only has the x column.

 

In the next step, I want to concatenate the two datasets into one and replace missing values in column y with values from x.

data ds_combined;
  set ds1 ds2;
  if missing(y) then
    y = x;
run;

The result looks like this

x y
1 1
2 2
3 3
4 4
5 4
6 4

which is not what I expected.

I narrowed the issue down to the fact that the y column is in ds1 but not ds2. Everything works as expected

  • if y is set to missing explicitly in ds2 or
  • if concatenation and replacement of missing values happen in two different data steps.

My understanding was that I can use multiple datasets in one SET statement and process the rows as if they come from just one dataset. Apperently this is not the case. Can anyone explain why this happens? Thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
Oligolas
Barite | Level 11

Hi,

yes it's one of the severe pitfalls of SAS.

SAS describes it as 'automatic retain' here

You'll find an explanation there

________________________

- Cheers -

View solution in original post

4 REPLIES 4
SASJedi
SAS Super FREQ

You don't clearly state what you WANT the output to look like. But if the goal is to UPDATE missing values in one dataset with the values from another, you may want to look at the UPDATE statement. And for that, your data sets really should have an ID so you can tell WHICH row needs updating. Try something like this, maybe?

data ds1;
  ID+1;
  x = 1;
  y = 1;
  output;
  ID+1;
  x = 2;
  y = .;
  output;
  ID+1;
  x = 3;
  y = 3;
  output;
run;

/* Create an update dataset for those rows with missing values */
data ds2;
   set ds1;
   where y is missing;
   y=x;
run;

/* Update the missing valuse */
data ds_combined;
   update ds1 ds2;
   by ID;
run;
Check out my Jedi SAS Tricks for SAS Users
Oligolas
Barite | Level 11

Hi,

yes it's one of the severe pitfalls of SAS.

SAS describes it as 'automatic retain' here

You'll find an explanation there

________________________

- Cheers -

yabwon
Onyx | Level 15

I wouldn't say it is a severe pitfall. If one understands the behaviour it's natural way to behave. 🙂

 

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



yabwon
Onyx | Level 15

Hi,

 

if you add some diagnostic to the data step:

data ds_combined;
  put "1) " _ALL_;
  set ds1 ds2 indsname=i;

  put "2) " _ALL_;

  if missing(y) then
    y = x;

  put "3) " _ALL_;
run;

you will see in the log:

1) i=  x=. y=. _ERROR_=0 _N_=1
2) i=WORK.DS1 x=1 y=1 _ERROR_=0 _N_=1
3) i=WORK.DS1 x=1 y=1 _ERROR_=0 _N_=1
1) i=WORK.DS1 x=1 y=1 _ERROR_=0 _N_=2
2) i=WORK.DS1 x=2 y=. _ERROR_=0 _N_=2
3) i=WORK.DS1 x=2 y=2 _ERROR_=0 _N_=2
1) i=WORK.DS1 x=2 y=2 _ERROR_=0 _N_=3
2) i=WORK.DS1 x=3 y=3 _ERROR_=0 _N_=3
3) i=WORK.DS1 x=3 y=3 _ERROR_=0 _N_=3
1) i=WORK.DS1 x=3 y=3 _ERROR_=0 _N_=4
2) i=WORK.DS2 x=4 y=. _ERROR_=0 _N_=4
3) i=WORK.DS2 x=4 y=4 _ERROR_=0 _N_=4
1) i=WORK.DS2 x=4 y=4 _ERROR_=0 _N_=5
2) i=WORK.DS2 x=5 y=4 _ERROR_=0 _N_=5
3) i=WORK.DS2 x=5 y=4 _ERROR_=0 _N_=5
1) i=WORK.DS2 x=5 y=4 _ERROR_=0 _N_=6
2) i=WORK.DS2 x=6 y=4 _ERROR_=0 _N_=6
3) i=WORK.DS2 x=6 y=4 _ERROR_=0 _N_=6
1) i=WORK.DS2 x=6 y=4 _ERROR_=0 _N_=7

Up to _N_ = 3 everything is ok.

In the "2)" for _N_ = 4 so the first place where values form DS2 were used y was set to missing. Since it was missing the IF condition was true so value was replaced by 4. Since Y came from a SAS dataset its value is automatically retained (see "1)" for _N_=5). Also since DS2 doesn't have y as a variable walues of y aren't overwritten.

 

Reversing:

set ds2 ds1;

will show you similar effect.

 

All the best

Bart

 

 

 

 

 

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1625 views
  • 3 likes
  • 4 in conversation