We would like to merge two or more datasets that share the same variables (time, Type_a, Type_b, pair). These are two datasets:
DS1:
time | Type_a | Type_b | pair |
time_3 | 1 | 1 | 1 |
time_6 | 2 | 0 | 0 |
time_9 | 0 | 1 | 0 |
time_12 | 1 | 1 | 1 |
time_18 | 1 | 1 | 1 |
DS2:
time | Type_a | Type_b | pair |
time_3 | 2 | 1 | 1 |
time_6 | 1 | 0 | 0 |
time_9 | 1 | 1 | 1 |
time_15 | 0 | 2 | 0 |
We want to get the following output:
time | Type_a_ds1 | Type_b_ds1 | pair_ds1 | Type_a_ds2 | Type_b_ds2 | pair_ds2 |
time_3 | 1 | 1 | 1 | 2 | 1 | 1 |
time_6 | 2 | 0 | 0 | 1 | 0 | 0 |
time_9 | 0 | 1 | 0 | 1 | 1 | 1 |
time_12 | 1 | 1 | 1 | 0 | 0 | 0 |
time_15 | 0 | 0 | 0 | 0 | 2 | 0 |
time_18 | 1 | 1 | 1 | 0 | 0 | 0 |
Notice that the datasets might not share the same values for time. As shown for time_15 (missing in DS1) and time_12 and time_18 (missing in DS2).
Generally a long form with a variable indicating the difference, i.e this record from set1 or set2 is quite often much more flexible.
If you really need to then the "trick" is to rename the variables with dataset options. Dummy code for generic response:
data want; set ds1 (rename=(type_a=type_a_ds1 type_b=type_b_ds1 pair=pair_ds1)) ds2 (rename=(type_a=type_a_ds2 type_b=type_b_ds2 pair=pair_ds2)) ; by time; run;
The rename as a data set option is done before the data is combined.
And that Time variable better sort properly...
Just a few items to consider:
Thanks for your comments. How do you change time from character to numeric for it to sort? Ok, for the using "." instead of 0.
Changing TIME to numeric:
data new_ds1;
set ds1;
numtime = input(substr(time, 6), 3.);
run;
SUBSTR will start with the 6th character, and capture all remaining characters. INPUT converts character to numeric (here, and also in most cases).
Thanks. Do you have any suggestions about the merging of the two datasets to get the desired output?
Generally a long form with a variable indicating the difference, i.e this record from set1 or set2 is quite often much more flexible.
If you really need to then the "trick" is to rename the variables with dataset options. Dummy code for generic response:
data want; set ds1 (rename=(type_a=type_a_ds1 type_b=type_b_ds1 pair=pair_ds1)) ds2 (rename=(type_a=type_a_ds2 type_b=type_b_ds2 pair=pair_ds2)) ; by time; run;
The rename as a data set option is done before the data is combined.
And that Time variable better sort properly...
Thanks! I used "rename".
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 16. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.