BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
walker105
Fluorite | Level 6

Hi, Community 

 

I am trying to create a new variable combining two variables. My data looks like this (if pre_a has a value, post_a doesn't. If post_a has a value, pre_a doesn't..): 

 

pre_a post_a pre_b post_b 

2       .            3         . 

.        .            .          2

.        3           3         .

.        3           .           . 

 

I would like to create a new variable 'Apple' combining pre_a and post_a, and 'Ball' combining pre_b and post_b. The dataset I would like to get looks like this:

 

pre_a post_a apple     pre_b post_b ball 

2       .             2             3         .         3

.        .              .             .          2        2 

.        3            3            3.          .        3

.        3            3             .          .         . 

 

I tried to use the following code:

 

apple=pre_a; if apple ne ' ' then output;
apple=post_a; if apple ne ' ' then output;

ball=pre_b; if ball  ne ' ' then output;
ball=post_b; if ball  ne ' ' then output;

keep apple ball; run; 

 

If using the code, I got the apple variable, correctly, but not a ball variable. Could you give some advice to do this? Or, is there different way to create a new variable combining two variables? 

 

Your answers would be very helpful! Thanks!

 

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Part of what I believe to be your problem is that when you use OUTPUT the second variable has not been set yet. If you check the log you should see that your output data set has close to twice as many output records and input unless you have many records where both the pre and post values are missing.

 

I can't tell whether your variables are character or not (too many people mix statements with inappropriate operations for variable type)

Here is an example similar to the logic you are using. I am using numeric values but the behavior is similar with character.

data have;
   input x1 x2 y1 y2;
datalines;
1 . . 2
. 2 . 2
1 . 2 .
;

data example;
   set have;
   xx = x1;
   if xx ne . then output;
   xx = x2;
   if xx ne . then output;
   yy = y1;
   if yy ne . then output;
   yy = y2;
   if yy ne . then output;
run;

Notice that the Example data set has 6 records in the output when only 3 are read in. Also the YY value is set on the 2nd, 4th and 6th records.

That is why I think you want the coalescec or coalesce function as suggested by @mkeintz . Example with the same data:

data better;
   set have;
   xx= coalesce (x1,x2);
   yy= coalesce (y1,y2);
run;

 

If your variables are character then this may go what you are expecting:

data want;
   set have;
   apple = coalescec (pre_a, post_a);
   ball = coalescec (pre_b, post_b);
run;

if the values are actually numeric instead of character the function would be Coalesce instead.

The two functions Coalesce and Coalescec return the first value found in the list of values which could be considerably more than just 2 values or variables.

 

When you have questions about unexpected behavior of code the best thing is to copy the entire data step or procedure code from the log with all the messages or warnings and then paste into a text box opened on the forum using the </> such as:

134  data example;
135     set have;
136     xx = x1;
137     if xx ne . then output;
138     xx = x2;
139     if xx ne . then output;
140     yy = y1;
141     if yy ne . then output;
142     yy = y2;
143     if yy ne . then output;
144  run;

NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.EXAMPLE has 6 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

The notes here clearly show reading 3 records and output 6. Since your example data has missing values for some of the pairs the number would not be exactly twice the number on the input set but I would expect it to be notably larger.

 

You also make it harder to tell what happened by only keeping the resulting Apple and Ball variables. When debugging code behavior you want to keep all of the variables that are involved so you can see what is calculated from which values.

 

View solution in original post

5 REPLIES 5
mkeintz
PROC Star

Take a look at the Coalesce  function.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
walker105
Fluorite | Level 6
Thanks! The Coalesce function works well for my data!
andreas_lds
Jade | Level 19

And what should the value of "apple" be, if pre_a = 4 and post_a = 5? Or is it impossible that both variable are non-missing?

ballardw
Super User

Part of what I believe to be your problem is that when you use OUTPUT the second variable has not been set yet. If you check the log you should see that your output data set has close to twice as many output records and input unless you have many records where both the pre and post values are missing.

 

I can't tell whether your variables are character or not (too many people mix statements with inappropriate operations for variable type)

Here is an example similar to the logic you are using. I am using numeric values but the behavior is similar with character.

data have;
   input x1 x2 y1 y2;
datalines;
1 . . 2
. 2 . 2
1 . 2 .
;

data example;
   set have;
   xx = x1;
   if xx ne . then output;
   xx = x2;
   if xx ne . then output;
   yy = y1;
   if yy ne . then output;
   yy = y2;
   if yy ne . then output;
run;

Notice that the Example data set has 6 records in the output when only 3 are read in. Also the YY value is set on the 2nd, 4th and 6th records.

That is why I think you want the coalescec or coalesce function as suggested by @mkeintz . Example with the same data:

data better;
   set have;
   xx= coalesce (x1,x2);
   yy= coalesce (y1,y2);
run;

 

If your variables are character then this may go what you are expecting:

data want;
   set have;
   apple = coalescec (pre_a, post_a);
   ball = coalescec (pre_b, post_b);
run;

if the values are actually numeric instead of character the function would be Coalesce instead.

The two functions Coalesce and Coalescec return the first value found in the list of values which could be considerably more than just 2 values or variables.

 

When you have questions about unexpected behavior of code the best thing is to copy the entire data step or procedure code from the log with all the messages or warnings and then paste into a text box opened on the forum using the </> such as:

134  data example;
135     set have;
136     xx = x1;
137     if xx ne . then output;
138     xx = x2;
139     if xx ne . then output;
140     yy = y1;
141     if yy ne . then output;
142     yy = y2;
143     if yy ne . then output;
144  run;

NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.EXAMPLE has 6 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

The notes here clearly show reading 3 records and output 6. Since your example data has missing values for some of the pairs the number would not be exactly twice the number on the input set but I would expect it to be notably larger.

 

You also make it harder to tell what happened by only keeping the resulting Apple and Ball variables. When debugging code behavior you want to keep all of the variables that are involved so you can see what is calculated from which values.

 

walker105
Fluorite | Level 6
Thank you for the explanation. I tried to use the Coalesce function, and it works well! Thanks.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 16149 views
  • 2 likes
  • 4 in conversation