Solved: Re: Merge datasets with different variables that measure the same thin...

ANKH1 · Posted 07-31-2023 02:26 PM

Hi, I would like to merge multiple datasets by ID. Each dataset has different number of rows. Some datasets have more than one variable of interest, in this case is age, that is, the same dataset might have age1, age2, age3. The end goal is to be able to report the frequencies of variable age per ID and be able to know where each data point comes from (depicted by the variable ds). These are the sample datasets:

data ds1;
input ID age1;
datalines;
1 12
3 11
4 10
4 15
;
run;

data ds2;
input ID age2 age3 age4;
datalines;
1 13 13 .
1 23 14 .
2 25 . 21
;
run;

data ds3;
input ID age5;
datalines;
3 22
3 32
3 33
4 12
;
run;

The output that we are aiming for is this before using proc freq by ID:

ID	age	ds
1	12	1
1	13	2
1	13	2
1	23	2
1	14	2
2	25	2
2	21	2
3	11	1
3	22	3
3	32	3
3	33	3
4	10	1
4	15	1
4	12	3

I tried this code but it omits the observation from ds2, ID1, age3=13:

data ages_all;
  /* First dataset */
  set ds1(rename=(age1=age) in=in1 keep=ID age1);
  
  /* Second dataset */
  set ds2(rename=(age2=age age3=age_renamed1 age4=age_renamed2) in=in2 keep=ID age2 age3 age4);
  
  /* Rename age_renamed1 and age_renamed2 to age */
  if in2 then age = coalesce(age_renamed1, age_renamed2);
  
  /* Third dataset */
  set ds3(rename=(age5=age) in=in3 keep=ID age5);

  by ID;
  
  ds = in1 + 2*in2 + 3*in3;
  drop age_renamed1 age_renamed2; /* Drop the temporary variables */
run;

Tom · Posted 07-31-2023 02:34 PM

It would be easier to just use an array.

data want;
  set ds1 ds2 ds3 ;
  by id;
  array x age1-age5;
  do index=1 to dim(x);
     age=x[index];
     if not missing(age) then output;
  end;
  drop age1-age5;
run;

If you really did need to know which value came from which dataset you use the INDEX variable for that. INDEX=1 is from DS1 and INDEX=5 is from DS3 and the other three are all from DS2.

View solution in original post

Tom · Posted 07-31-2023 02:34 PM

It would be easier to just use an array.

data want;
  set ds1 ds2 ds3 ;
  by id;
  array x age1-age5;
  do index=1 to dim(x);
     age=x[index];
     if not missing(age) then output;
  end;
  drop age1-age5;
run;

If you really did need to know which value came from which dataset you use the INDEX variable for that. INDEX=1 is from DS1 and INDEX=5 is from DS3 and the other three are all from DS2.

ANKH1 · Posted 07-31-2023 03:30 PM

Thanks! This is very helpful. Is there a way to say that age2,age3 and age4 belong to index 2? The problem is that we have more than 20 datasets and each one might or might have multiple age variables.

PaigeMiller · Posted 07-31-2023 03:33 PM

Use the INDSNAME= option in the SET statement

Example: https://documentation.sas.com/doc/en/pgmmvacdc/9.4/lestmtsref/p00hxg3x8lwivcn1f0e9axziw57y.htm#p00fw...

--
Paige Miller

Tom · Posted 07-31-2023 04:09 PM

You could do it the same way as in your original program using the IN= dataset option.

For example to make a numeric variable:

set ds1(in=in1) ds2(in=in2) ds3(in=in3);
....
ds=in1 + 2*in2 +3*in3;
...

Or you could use the INDSNAME= option of the SET statement can create a character variable.

set ds1 ds2 ds3 indsname=indsname;
...
dsname=indsname;
...

ANKH1 · Posted 07-31-2023 04:20 PM

Thanks! I used your code with indsname, it worked! How can I accept two answers as solutions?

ballardw · Posted 07-31-2023 03:36 PM

Multiple SET statements behave in a way you likely do not expect.

You really need to watch what the observations notes in the log shows.

Brief example:

data one;
   input x;
datalines;
1
2
3
;

data two;
   input y;
datalines;
11
22
;

data sets;
   set one;
   set two;
run;

How many observations do you expect?

Now, read the LOG:

16   data sets;
17      set one;
18      set two;
19   run;

NOTE: There were 3 observations read from the data set WORK.ONE.
NOTE: There were 2 observations read from the data set WORK.TWO.
NOTE: The data set WORK.SETS has 2 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

3 observations from the first set but the final output data only has 2 observations.

Likely your output only includes the number of observations from the set with the smallest number.

MERGE is a very specific concept in SAS and what you did is not a merge. Especially when you want to combine records according to the value of one or more variables. SAS uses a BY statement with MERGE to align observations on a common value of one or more variables. Caution: Merge does not like, meaning you get unexpected results, if more than one data set contributes multiple values to a Merge.

ANKH1 · Posted 07-31-2023 04:03 PM

Ok, thanks for the explanation. I shouldn't say merge.

ballardw · Posted 07-31-2023 04:34 PM

@ANKH1 wrote:
Ok, thanks for the explanation. I shouldn't say merge.

AND realize that multiple SET statements is a much more complex issue than stacking (appending) data.

Merge datasets with different variables that measure the same thing and consolidate them in one

Re: Merge datasets with different variables that measure the same thing and consolidate them in one

Re: Merge datasets with different variables that measure the same thing and consolidate them in one

Re: Merge datasets with different variables that measure the same thing and consolidate them in one

Re: Merge datasets with different variables that measure the same thing and consolidate them in one

Re: Merge datasets with different variables that measure the same thing and consolidate them in one

Re: Merge datasets with different variables that measure the same thing and consolidate them in one

Re: Merge datasets with different variables that measure the same thing and consolidate them in one

Re: Merge datasets with different variables that measure the same thing and consolidate them in one

Re: Merge datasets with different variables that measure the same thing and consolidate them in one

ID	age	ds
1	12	1
1	13	2
1	13	2
1	23	2
1	14	2
2	25	2
2	21	2
3	11	1
3	22	3
3	32	3
3	33	3
4	10	1
4	15	1
4	12	3

ID	age	ds
1	12	1
1	13	2
1	13	2
1	23	2
1	14	2
2	25	2
2	21	2
3	11	1
3	22	3
3	32	3
3	33	3
4	10	1
4	15	1
4	12	3

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away

ID	age	ds
1	12	1
1	13	2
1	13	2
1	23	2
1	14	2
2	25	2
2	21	2
3	11	1
3	22	3
3	32	3
3	33	3
4	10	1
4	15	1
4	12	3