DATA Step, Macro, Functions and more

Concentating Datasets certain variable are retained

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 9
Accepted Solution

Concentating Datasets certain variable are retained

Hi Guys,

 

I've got a question about concentation of datasets and the new variables created during he same step. The problem is that if I update a variable that is in the one dataset but not the other. That the value is retained. I've managed to find a solution, but I'm not sure why this is happening to begin with.

 

The sample code is straight forward.

/*dummy data for testing*/

data g1;

  vals = 1.25;

  do i = 1 to 5;

    value = vals * 1;

    valsn = vals; *<-- Will not be in the second dataset.;

    output;

  end;

run;

data g2;

  vals = 1;

  do j = 1 to 7;

       vals = vals + j;

   output;

  end;

run;

 

/*Straight forward concatenation*/

data g3;

set g1(in=a) g2(in=b);

if b then do;

  if j in (1,3,5,7) then do;

    valsn = vals + (j*2);

  end;

end;

run;

proc print data=g3;

run;

 

The value for valsn is retained on all observation coming from g2 regardless. A simple fix is to initialize valsn before the (if j in (1,3,5,7) then doSmiley Wink eg: valsn = .; But I intriged to understand why this is happening to begin with. 

 

Thanks in advance!

 

 


Accepted Solutions
Solution
‎11-19-2015 12:10 AM
Trusted Advisor
Posts: 1,117

Re: Concentating Datasets certain variable are retained

There are indeed situations like the concatenations you describe where the automatic retention of variables read with a SET statement can be somewhat perplexing. However, this has been standard SAS behavior for ages (or ever). It is described, for example, in the online documentation of the RETAIN statement, section "Redundancy": "variables that are read with a SET, MERGE, MODIFY or UPDATE statement" are among those whose "values are automatically retained from one iteration of the DATA step to the next." This is in fact a feature and it is probably used most in match-merging.

 

The important fact that @Astounding mentioned, "when you switch data sets, variables get reinitialized", seems to be harder to find in the documentation, but here it is: in the book Language Reference: Concepts. Quote: "SAS continues to read one observation at a time from the first data set until it finds an end-of-file indicator. The values of the variables in the program data vector are then set to missing, and SAS begins reading observations from the second data set ..."

 

The 2012 conference paper The Use and Abuse of the Program Data Vector (23 pages) takes a detailed look at the internal workings of the data step. In the section "Multiple data sets in the set statement" the author presents an example similar to yours. As far as I see, his argument is wrong where he tries to explain what happens with the first observation of the second dataset! He incorrectly states that a value "remains from the previous data step iteration." I think his own example data and code prove him wrong at this point. So, you see, you're not alone with your confusion. Smiley Happy

 

In your second example VALSN is created with an assignment statement, hence it is not retained (like "all variables that are created with INPUT or assignment statements", as the online doc [see first link above] describes), unless you request this by using a RETAIN statement.

 

View solution in original post


All Replies
Regular Contributor
Posts: 161

Re: Concentating Datasets certain variable are retained

You have coded it that way

set g1(in=a) g2(in=b);
if b then do;
 
Kannan Deivasigamani
Occasional Contributor
Posts: 9

Re: Concentating Datasets certain variable are retained

What I'm struggling to understand is why the value of VALSN is retained in the concatenation and only when VALSN exists in one of the datasets. If VALSN is created in the concatenation part then the value is not retained and acts as I would expect it.

 

 example:

/*dummy data for testing*/

data g1;

vals = 1.25;

do i = 1 to 5;

value = vals * 1;

* valsn = vals; *<-- Will not be in the second dataset.;

output;

end;

run;

data g2;

vals = 1;

do j = 1 to 7;

vals = vals + j;

output;

end;

run;

/*Straight forward concatenation*/

data g3;

set g1(in=a) g2(in=b);

if b then do;

if j in (1,3,5,7) then do;

valsn = vals + (j*2);

end;

end;

run;

proc print data=g3;

run;

 

The valsn is no retained.

Obs VALS I VALUE J VALSN

1 1.25 1 1.25

5 1.25 5 1.25

6 2.00 1 4

7 4.00 2

8 7.00 3 13

9 11.00 4

10 16.00 5 26

11 22.00 6

12 29.00 7 43

 

Where as if VALSN is created before the concatenation The result are:

Obs VALS I VALUE VALSN J

1   1.25 1 1.25  1.25

5   1.25 5 1.25  1.25

6   2.00         4.00  1

7   4.00         4.00  2

8   7.00         13.00 3

9  11.00         13.00 4

10 16.00         26.00 5

11 22.00         26.00 6

12 29.00         43.00 7

 

But the code I've got in place is applicable to: j in (1,3,5,7). Yet the values are being "retained"

Regular Contributor
Posts: 161

Re: Concentating Datasets certain variable are retained

Sorry, I misread your post. I tried a variation of your code and my results are similar when you bring in both datasets together..... need to understand its behaviour....  Here is what I observed with the 2 scenarious I tested:

 

data d1;newvar = 0;run;

data d2;  
  do j = 1 to 5;       
     output;
  end;
run;

data d3;
set d2;
if j = 3 then newvar = 500;
run;

data d4;
set  d1 d2;
  if j = 3 then newvar = 500;
run;

proc print data=d3; title '==d3==';
proc print data=d4; title '==d4==';

 

Resulted in this output:

 

==d3==
Obs	j	newvar
1	1	.
2	2	.
3	3	500
4	4	.
5	5	.
==d4==
Obs	newvar	j
1	0	.
2	.	1
3	.	2
4	500	3
5	500	4
6	500	5

As you can see, in the first example, it presented missing values for j<> 3. However, the value was retained for j >= 3.   I need to dig in a bit and see what I can find....

Kannan Deivasigamani
Solution
‎11-19-2015 12:10 AM
Trusted Advisor
Posts: 1,117

Re: Concentating Datasets certain variable are retained

There are indeed situations like the concatenations you describe where the automatic retention of variables read with a SET statement can be somewhat perplexing. However, this has been standard SAS behavior for ages (or ever). It is described, for example, in the online documentation of the RETAIN statement, section "Redundancy": "variables that are read with a SET, MERGE, MODIFY or UPDATE statement" are among those whose "values are automatically retained from one iteration of the DATA step to the next." This is in fact a feature and it is probably used most in match-merging.

 

The important fact that @Astounding mentioned, "when you switch data sets, variables get reinitialized", seems to be harder to find in the documentation, but here it is: in the book Language Reference: Concepts. Quote: "SAS continues to read one observation at a time from the first data set until it finds an end-of-file indicator. The values of the variables in the program data vector are then set to missing, and SAS begins reading observations from the second data set ..."

 

The 2012 conference paper The Use and Abuse of the Program Data Vector (23 pages) takes a detailed look at the internal workings of the data step. In the section "Multiple data sets in the set statement" the author presents an example similar to yours. As far as I see, his argument is wrong where he tries to explain what happens with the first observation of the second dataset! He incorrectly states that a value "remains from the previous data step iteration." I think his own example data and code prove him wrong at this point. So, you see, you're not alone with your confusion. Smiley Happy

 

In your second example VALSN is created with an assignment statement, hence it is not retained (like "all variables that are created with INPUT or assignment statements", as the online doc [see first link above] describes), unless you request this by using a RETAIN statement.

 

Occasional Contributor
Posts: 9

Re: Concentating Datasets certain variable are retained

Posted in reply to FreelanceReinhard

Thanks for the feedback. This is what I was looking for.

Super User
Posts: 5,509

Re: Concentating Datasets certain variable are retained

It's automatic.  Any variable that comes from a SAS data set is automatically retained.

 

When you switch data sets, variables get reinitialized.  You could see this if you changed your logic to say:

 

if j in (2, 4, 6) then do;

 

Good luck.

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 324 views
  • 6 likes
  • 4 in conversation