Hi.
How do I merge two columns like these two:
Obs Height1 Height2
1 150
2 160
3 170 170
4 180 190
5 NA NA
The result should look like
Obs Height1 Height2 HeightA HeightB
1 150 150 150
2 160 160 160
3 170 170 170 170
4 180 190 180 190
5 NA NA NA NA
Thanks.
Perhaps a simple approach is best:
data want;
set have;
if Height1 > ' ' then HeightA = Height1;
else HeightA = Height2;
If Height2 > ' ' then HeightB = Height2;
else HeightB = Height1;
run;
At least consider where the proposed solutions are the same and where they are different, especially observation 4.
Actually, this should be equivalent to a slight variation on a previous proposal:
data want;
set have;
HeightA = coalesce(Height1, Height2);
HeightB = coalesce(Height2, Height1);
run;
heighta=coalesce(height1,height2);
heightb=coalesce(height1,height2);
Thank you @novinosrin @hashman @Astounding!
@novinosrin has the right idea, however it has to be wrapped into something executable and furnished with some details to guard against truncation in case the lengths of the two H-variables are different. Note that I made the lengths of H1 and H2 different on purpose.
data have ; input H1:$3. H2:$5. ; cards ; 130 . . 140.5 150 150 . 160.5 170 . 180 180 NA NA ; run ; data want ;
set have ;
HA = H1 || H2 ;
HB = HA ;
HA = coalesceC (H1, H2) ;
HB = HA ;
run ;
The first two statements after SET set the lengths of HA and HB to the sum of the system length of H1 and H2. Otherwise, they would be assigned length $200, which could be way too long or two short (resulting in truncation) depending on the actual lengths of H1 and H2; and it's better to be safe than sorry. Then the COALESCE* function (coalesceC here since it is the DATA step, not SQL, and the variables are character) does what @novinosrin has suggested.
It would be still better to assign HA and HB the length equal to the maximum of the lengths of H1 and H2. Though this requires an extra step, it's a very quick one, run time wise:
data _null_ ; set have (obs=0) ; call symputx ("L", vlength (H1) <> vlength (H2)) ; stop ; run ; data want ; set have ; HA = put (coalesceC (H1, H2), $&L..) ; HB = HA ; run ;
HTH
Paul D.
Perhaps a simple approach is best:
data want;
set have;
if Height1 > ' ' then HeightA = Height1;
else HeightA = Height2;
If Height2 > ' ' then HeightB = Height2;
else HeightB = Height1;
run;
At least consider where the proposed solutions are the same and where they are different, especially observation 4.
Actually, this should be equivalent to a slight variation on a previous proposal:
data want;
set have;
HeightA = coalesce(Height1, Height2);
HeightB = coalesce(Height2, Height1);
run;
Your approach will work if the original variables are of the same lengths.
However, it could result in truncation if one variable is shorter/longer than the other.
If both are shorter than $200, it could be alleviated by:
data want ; set have ; HeightA = ifC (Height1 > '', Height1, Height2) ; HeightB = ifC (Height2 > '', Height2, Height1) ; run ;
But that would set HeightA and HeightB to the default $200, which is not good for space management. If the original variables should happen to be longer than $200, then there again could be problems with truncation. The same is true for the COALESCE approach, not to mention that in the DATA step, this function is numeric and won't work in this case - COALESCEC is needed instead.
Failing to size up receiving character variables properly is a common coding problem in SAS with its fixed character lengths. I don't see how including a simple safeguard (e.g., using the concatenation - or, better, a short preliminary extra DATA step to get the max length of the two columns) is more complex (or even much lengthier code-wise) than the approach you consider "simple". Oftentimes, apparent simplicity incurs a considerable cost as far as the validity of the result is concerned.
Paul D.
Paul D.,
All wise things to consider. Now that you mention COALESCEC, I may need to sit down for a week and read about all the functions that SAS has added over the years. Who knows, I may even learn parsing functions.
Here's the "it depends" part. In practice, I try to gear my answers toward the person who asks. Depending on (my perception of) their level of SAS knowledge, I will either add to or remove from my replies. So rightly or wrongly, I am prejudging how much a poster can absorb. I have seen far too much evidence that convinces me this is the right approach ... posters accepting wrong answers, posters leaving out key information that a problem requires, posters requiring that a solution use a certain tool even if they don't know how to solve the problem, posters accepting solutions that they clearly don't understand. For my benefit, I'm happy to see your concatenation approach.
Your approach is sensible and economical. Adding "btw"s, more context, and pointing at underwater reefs is an old habit from SAS-L of yore inherited from Ian Whitlock when he reigned there. In fact, I've learned more from those extras than from direct pointed answers to questions of the "how to do this" type. An extra consideration behind a more, uh, developed reply is that it is read not only by that who posed the question but by others, and the reply can be augmented or contradicted. More than once, straightened up by folks who knew far more than I did, I realized that I'd formed a wrong idea, and those lessons have seared in my memory. In the short time I've been around in this space, it's happened, too, and I'm looking forward to such friendly contradictions.
Best
Paul D.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.