BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
ANKH1
Pyrite | Level 9

We would like to merge two or more datasets that share the same variables (time, Type_a, Type_b, pair).  These are two datasets:

DS1:

timeType_aType_bpair
time_3111
time_6200
time_9010
time_12111
time_18111

DS2:

timeType_aType_bpair
time_3211
time_6100
time_9111
time_15020

We want to get the following output:

timeType_a_ds1Type_b_ds1pair_ds1Type_a_ds2Type_b_ds2pair_ds2
time_3111211
time_6200100
time_9010111
time_12111000
time_15000020
time_18111000

Notice that the datasets might not share the same values for time. As shown for time_15 (missing in DS1) and time_12 and time_18 (missing in DS2).

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Generally a long form with a variable indicating the difference, i.e this record from set1 or set2 is quite often much more flexible.

 

If you really need to then the "trick" is to rename the variables with dataset options. Dummy code for generic response:

data want;
   set 
      ds1 (rename=(type_a=type_a_ds1 type_b=type_b_ds1 pair=pair_ds1))
      ds2 (rename=(type_a=type_a_ds2 type_b=type_b_ds2 pair=pair_ds2))
   ;
   by time;
run;

The rename as a data set option is done before the data is combined.

And that Time variable better sort properly...

View solution in original post

7 REPLIES 7
Astounding
PROC Star

Just a few items to consider:

  1. Why do this at all?  The data appears to be useful in its current form.
  2. Why keep TIME as a character variable?  It would be more useful as numeric, and would then easily sort in the right order.
  3. When there is no measurement (such as time_15 in DS1), why change missing values to a zero?  Zero is a perfectly good measurement on other observations, so why use that to represent a missing value?
ANKH1
Pyrite | Level 9

Thanks for your comments. How do you change time from character to numeric for it to sort? Ok, for the using "." instead of 0. 

Astounding
PROC Star

Changing TIME to numeric:

data new_ds1;
   set ds1;
   numtime = input(substr(time, 6), 3.);
run;

SUBSTR will start with the 6th character, and capture all remaining characters.  INPUT converts character to numeric (here, and also in most cases).

ANKH1
Pyrite | Level 9

Thanks. Do you have any suggestions about the merging of the two datasets to get the desired output?

Reeza
Super User
Transpose each data set to a long data set instead.
Stack the datasets together adding the dataset names using the INDSNAME option.
Transpose back to wide, if necessary using the original variable name + datasetname to uniquely identify each column.
ballardw
Super User

Generally a long form with a variable indicating the difference, i.e this record from set1 or set2 is quite often much more flexible.

 

If you really need to then the "trick" is to rename the variables with dataset options. Dummy code for generic response:

data want;
   set 
      ds1 (rename=(type_a=type_a_ds1 type_b=type_b_ds1 pair=pair_ds1))
      ds2 (rename=(type_a=type_a_ds2 type_b=type_b_ds2 pair=pair_ds2))
   ;
   by time;
run;

The rename as a data set option is done before the data is combined.

And that Time variable better sort properly...

ANKH1
Pyrite | Level 9

Thanks! I used "rename".

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 519 views
  • 0 likes
  • 4 in conversation