Solved: Re: How to combine two variables into one variable

walker105 · Posted 02-09-2021 12:26 AM

Hi, Community

I am trying to create a new variable combining two variables. My data looks like this (if pre_a has a value, post_a doesn't. If post_a has a value, pre_a doesn't..):

pre_a post_a pre_b post_b

2 . 3 .

. . . 2

. 3 3 .

. 3 . .

I would like to create a new variable 'Apple' combining pre_a and post_a, and 'Ball' combining pre_b and post_b. The dataset I would like to get looks like this:

pre_a post_a apple pre_b post_b ball

2 . 2 3 . 3

. . . . 2 2

. 3 3 3. . 3

. 3 3 . . .

I tried to use the following code:

apple=pre_a; if apple ne ' ' then output;
apple=post_a; if apple ne ' ' then output;

ball=pre_b; if ball ne ' ' then output;
ball=post_b; if ball ne ' ' then output;

keep apple ball; run;

If using the code, I got the apple variable, correctly, but not a ball variable. Could you give some advice to do this? Or, is there different way to create a new variable combining two variables?

Your answers would be very helpful! Thanks!

ballardw · Posted 02-09-2021 01:46 AM

Part of what I believe to be your problem is that when you use OUTPUT the second variable has not been set yet. If you check the log you should see that your output data set has close to twice as many output records and input unless you have many records where both the pre and post values are missing.

I can't tell whether your variables are character or not (too many people mix statements with inappropriate operations for variable type)

Here is an example similar to the logic you are using. I am using numeric values but the behavior is similar with character.

data have;
   input x1 x2 y1 y2;
datalines;
1 . . 2
. 2 . 2
1 . 2 .
;

data example;
   set have;
   xx = x1;
   if xx ne . then output;
   xx = x2;
   if xx ne . then output;
   yy = y1;
   if yy ne . then output;
   yy = y2;
   if yy ne . then output;
run;

Notice that the Example data set has 6 records in the output when only 3 are read in. Also the YY value is set on the 2nd, 4th and 6th records.

That is why I think you want the coalescec or coalesce function as suggested by @mkeintz . Example with the same data:

data better;
   set have;
   xx= coalesce (x1,x2);
   yy= coalesce (y1,y2);
run;

If your variables are character then this may go what you are expecting:

data want;
   set have;
   apple = coalescec (pre_a, post_a);
   ball = coalescec (pre_b, post_b);
run;

if the values are actually numeric instead of character the function would be Coalesce instead.

The two functions Coalesce and Coalescec return the first value found in the list of values which could be considerably more than just 2 values or variables.

When you have questions about unexpected behavior of code the best thing is to copy the entire data step or procedure code from the log with all the messages or warnings and then paste into a text box opened on the forum using the </> such as:

134  data example;
135     set have;
136     xx = x1;
137     if xx ne . then output;
138     xx = x2;
139     if xx ne . then output;
140     yy = y1;
141     if yy ne . then output;
142     yy = y2;
143     if yy ne . then output;
144  run;

NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.EXAMPLE has 6 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

The notes here clearly show reading 3 records and output 6. Since your example data has missing values for some of the pairs the number would not be exactly twice the number on the input set but I would expect it to be notably larger.

You also make it harder to tell what happened by only keeping the resulting Apple and Ball variables. When debugging code behavior you want to keep all of the variables that are involved so you can see what is calculated from which values.

View solution in original post

mkeintz · Posted 02-09-2021 12:37 AM

Take a look at the Coalesce function.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

walker105 · Posted 02-09-2021 12:51 PM

Thanks! The Coalesce function works well for my data!

andreas_lds · Posted 02-09-2021 01:31 AM

And what should the value of "apple" be, if pre_a = 4 and post_a = 5? Or is it impossible that both variable are non-missing?

ballardw · Posted 02-09-2021 01:46 AM

Part of what I believe to be your problem is that when you use OUTPUT the second variable has not been set yet. If you check the log you should see that your output data set has close to twice as many output records and input unless you have many records where both the pre and post values are missing.

I can't tell whether your variables are character or not (too many people mix statements with inappropriate operations for variable type)

Here is an example similar to the logic you are using. I am using numeric values but the behavior is similar with character.

data have;
   input x1 x2 y1 y2;
datalines;
1 . . 2
. 2 . 2
1 . 2 .
;

data example;
   set have;
   xx = x1;
   if xx ne . then output;
   xx = x2;
   if xx ne . then output;
   yy = y1;
   if yy ne . then output;
   yy = y2;
   if yy ne . then output;
run;

Notice that the Example data set has 6 records in the output when only 3 are read in. Also the YY value is set on the 2nd, 4th and 6th records.

That is why I think you want the coalescec or coalesce function as suggested by @mkeintz . Example with the same data:

data better;
   set have;
   xx= coalesce (x1,x2);
   yy= coalesce (y1,y2);
run;

If your variables are character then this may go what you are expecting:

data want;
   set have;
   apple = coalescec (pre_a, post_a);
   ball = coalescec (pre_b, post_b);
run;

if the values are actually numeric instead of character the function would be Coalesce instead.

The two functions Coalesce and Coalescec return the first value found in the list of values which could be considerably more than just 2 values or variables.

When you have questions about unexpected behavior of code the best thing is to copy the entire data step or procedure code from the log with all the messages or warnings and then paste into a text box opened on the forum using the </> such as:

134  data example;
135     set have;
136     xx = x1;
137     if xx ne . then output;
138     xx = x2;
139     if xx ne . then output;
140     yy = y1;
141     if yy ne . then output;
142     yy = y2;
143     if yy ne . then output;
144  run;

NOTE: There were 3 observations read from the data set WORK.HAVE.
NOTE: The data set WORK.EXAMPLE has 6 observations and 6 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds

The notes here clearly show reading 3 records and output 6. Since your example data has missing values for some of the pairs the number would not be exactly twice the number on the input set but I would expect it to be notably larger.

You also make it harder to tell what happened by only keeping the resulting Apple and Ball variables. When debugging code behavior you want to keep all of the variables that are involved so you can see what is calculated from which values.

walker105 · Posted 02-09-2021 12:52 PM

Thank you for the explanation. I tried to use the Coalesce function, and it works well! Thanks.

How to combine two variables into one variable

Re: How to combine two variables into one variable

Re: How to combine two variables into one variable

Re: How to combine two variables into one variable

Re: How to combine two variables into one variable

Re: How to combine two variables into one variable

Re: How to combine two variables into one variable

Registration is open

SAS Training: Just a Click Away