DATA Step, Macro, Functions and more

Merging two datasets with different number of rows per id

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 19
Accepted Solution

Merging two datasets with different number of rows per id

Hello all!

 

I have two datasets with a different number of rows per person_id, and I would like to merge the two datasets in a way that each row from dataset B gets added to all rows for that corresponding person in dataset A.

 

Below is an example of what the original datasets would look like:

 

DATASET A:

id  value

1  A

1  B

1  C

2  A

2  C

 

DATASET B:

id  med   start_date   end_date

1    1          1/5/13        2/10/16

1    2          3/10/16      4/10/16

2    1           1/8/13        1/10/13

2    2           3/5/18        5/3/18

2    3           5/5/18        6/1/18

 

This is what I would like the merged dataset to look like:

id value  med   start_date   end_date

1    A        1          1/5/13        2/10/16

1    A        2          3/10/16      4/10/16

1    B        1          1/5/13        2/10/16

1    B        2          3/10/16      4/10/16

1    C        1          1/5/13        2/10/16

1    C        2          3/10/16      4/10/16

2    A        1           1/8/13        1/10/13

2    A        2           3/5/18        5/3/18

2    A        3           5/5/18        6/1/18

2    C        1           1/8/13        1/10/13

2    C        2           3/5/18        5/3/18

2    C        3           5/5/18        6/1/18

 

Can anyone advise me how I can merge the two datasets so that all rows for each person in dataset B gets added to each row for that same person in dataset A?

 

Thank you for your help!


Accepted Solutions
Solution
a week ago
Super User
Posts: 9,873

Re: Merging two datasets with different number of rows per id

@Astounding's code is missing the "from" part of the create table/select statement:

proc sql;
create table want as
select a.value, b.*
from a, b
where a.id = b.id
;
quit;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code

View solution in original post


All Replies
Super User
Posts: 6,628

Re: Merging two datasets with different number of rows per id

The easiest way is PROC SQL:

 

proc sql;

create table want as

select a.value, b.* where a.id = b.id;

quit;

Occasional Contributor
Posts: 19

Re: Merging two datasets with different number of rows per id

Posted in reply to Astounding

Astounding, thanks for the reply!

 

Is there any way to actually do this as a data step without creating a table in proc sql? My understanding about proc sql tables is that I can't actually use that table as a dataset. I want to do this merge in a way that creates an actual dataset that I can manipulate further.

Super User
Posts: 23,237

Re: Merging two datasets with different number of rows per id

Proc SQL creates datasets if you have the CREATE TABLE portion. The SELECT alone does not.
Occasional Contributor
Posts: 19

Re: Merging two datasets with different number of rows per id

Posted in reply to Astounding

I didn't know that (about the create table option).

 

The code from Astounding doesn't seem to be working when I try it. I think I'm just not using it correctly. Astounding (or anyone else), could you clarify that code for me? For the code a.value, would "a" be referring to the dataset name and "value" referring to the variable name?

 

In the example datasets that I gave, I would be trying to copy all columns (value, med, start_date, end_date), with the four values for each id getting added to all rows for that corresponding id from the other dataset.

Solution
a week ago
Super User
Posts: 9,873

Re: Merging two datasets with different number of rows per id

@Astounding's code is missing the "from" part of the create table/select statement:

proc sql;
create table want as
select a.value, b.*
from a, b
where a.id = b.id
;
quit;
---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 6,628

Re: Merging two datasets with different number of rows per id

Posted in reply to KurtBremser

Good catch, @KurtBremser

 

Regarding the other concerns, such as creating a table, being able to use it, all of those should be fine.  Just test it on your own test data to verify for yourself.

Super User
Posts: 23,237

Re: Merging two datasets with different number of rows per id


@SarahW13 wrote:

I didn't know that (about the create table option).

 

The code from Astounding doesn't seem to be working when I try it. I think I'm just not using it correctly. Astounding (or anyone else), could you clarify that code for me? For the code a.value, would "a" be referring to the dataset name and "value" referring to the variable name?

 

In the example datasets that I gave, I would be trying to copy all columns (value, med, start_date, end_date), with the four values for each id getting added to all rows for that corresponding id from the other dataset.


If something isn't working please post the non working code and the log at minimum.

Occasional Contributor
Posts: 17

Re: Merging two datasets with different number of rows per id

I don't think merge will give you the desired result, the best way to do this is by proc sql.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 134 views
  • 0 likes
  • 5 in conversation