BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
EIL
Fluorite | Level 6 EIL
Fluorite | Level 6

I have two datasets that I would like to concatenate. They have the same information, with the same formatting, but in variables with different names. I tried to combine them using the technique given in the example code below.

 

DATA TEST1;
	INPUT ID VALUE1;
	DATALINES;
	1 0.5
	2 0.6
	3 0.7
	4 .
	;
RUN;

DATA TEST2;
	INPUT ID VALUE2;
	DATALINES;
	5 0.4
	6 0.5
	7 0.3
	8 . 
	;
RUN;

DATA TEST3;
	SET TEST1 TEST2;
	IF MISSING(VALUE1) THEN VALUE1 = VALUE2;
RUN;

PROC PRINT DATA = TEST3 NOOBS; RUN;

This is my output:

 

ID VALUE1 VALUE2

10.5.
20.6.
30.7.
4..
50.40.4
60.40.5
70.40.3
80.4.

 

I tried a few tweaks to dataset TEST2 and it looks like SAS is simply assigning the first non-missing VALUE2 value to all of the rows after that non-missing value appears. For example, if ID 5 has VALUE2 = 0.5, then ID 5, 6, 7, and 8 will all be assigned VALUE1 = 0.5. If ID 5 has VALUE2 = . and ID 6 has VALUE2 = 0.5, then ID 5 will be assigned VALUE1 = . and ID's 6, 7, and 8 will be assigned VALUE2 = 0.5. (To be clear, my desired result is that VALUE1 is assigned the same value as VALUE2 for each of ID 5, 6, 7, 8, so that ID 5 has VALUE1 = 0.4, ID 6 has VALUE1 = 0.5, etc.)

 

For my original program, I was able to get the outcome I wanted by using the RENAME data set option instead of an if statement. But I want to understand why my code is working this way. It seems totally contrary to my understanding of the DATA step. How is SAS even able to assign the same value to all of the subsequent rows? Without a RETAIN statement, shouldn't everything except _N_ and _ERROR_ be reassigned to missing values before the next record is read? It seems impossible, which means I must be misunderstanding something fundamental about the DATA step. 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
DATA TEST3;
	SET TEST1 TEST2;
	
	value1 = coalesce(value1, value2);
RUN;

PROC PRINT DATA = TEST3 NOOBS; RUN;

Save it to a new variable solves the issue. 

 

The reassigning to a variable that already exists is introducing an implicit retain. Changing the variable name fixes the issue. This at least gives you the workaround. I'll leave the full technical discussion to someone else. 

View solution in original post

5 REPLIES 5
novinosrin
Tourmaline | Level 20

Hi @EIL  All variables read using a SET statement are automatically retained. In your code values of both value1 and value2 would indeed be retained unless you explicitly assign missing during earch iteration of the datastep.

EIL
Fluorite | Level 6 EIL
Fluorite | Level 6

Oh, I see, of course! That is very helpful information, thank you.

 

I'm still left confused, though. I don't understand why the IF/THEN statement would use the retained value instead of a value from the new data record. Also, shouldn't it be retaining the previous row's value each time, instead of assigning the same value to all of the subsequent records?

 

I tried experimenting with some other DATA statements to try to get a grip on what's happening here:

 

DATA TEST3;
	SET TEST1 TEST2;
	IF NOT MISSING(ID) THEN MYVARA = VALUE2;
	MYVARB = VALUE2;
	IF MISSING(VALUE1) THEN VALUE1 = VALUE2;
	MYVARC = VALUE2;
RUN;

But I got the following results:

ID VALUE1 VALUE2 MYVARA MYVARB MYVARC

10.5....
20.6....
30.7....
4.....
50.40.40.40.40.4
60.40.50.50.50.5
70.40.30.30.30.3
80.4....

 

The variable assignments are working like they "should" for every situation except reassigning VALUE1. I'm still confused about what's different between the case where SAS assigns the first nonmissing VALUE2 value, and the cases where SAS assigns the "correct" VALUE2 value. 

Reeza
Super User
DATA TEST3;
	SET TEST1 TEST2;
	
	value1 = coalesce(value1, value2);
RUN;

PROC PRINT DATA = TEST3 NOOBS; RUN;

Save it to a new variable solves the issue. 

 

The reassigning to a variable that already exists is introducing an implicit retain. Changing the variable name fixes the issue. This at least gives you the workaround. I'll leave the full technical discussion to someone else. 

EIL
Fluorite | Level 6 EIL
Fluorite | Level 6
Thanks, that's very helpful! I guess my question was really "how do I avoid unexpected results like these" and it seems as though "don't reassign to variables that already exists" is the answer. I would appreciate a more technical explanation if anyone is up for giving it, but that's the basic information I needed.

That coalesce function looks really useful, by the way, thank you for mentioning it.
Tom
Super User Tom
Super User

Variables sourced from datasets are always retained. That is why many to one merges work.  

data decodes;
   input code decode :$10. @@;
cards;
1 red 2 blue 3 green
;
data items;
  input code id @@;
cards;
1 1  1 2 1 3
2 4 2 5 
3 6 3 7
;
data want;
  merge items decodes;
  by code;
run;
Obs    code    id    decode

 1       1      1    red
 2       1      2    red
 3       1      3    red
 4       2      4    blue
 5       2      5    blue
 6       3      6    green
 7       3      7    green

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 981 views
  • 3 likes
  • 4 in conversation