Hello all!
I have two datasets with a different number of rows per person_id, and I would like to merge the two datasets in a way that each row from dataset B gets added to all rows for that corresponding person in dataset A.
Below is an example of what the original datasets would look like:
DATASET A:
id value
1 A
1 B
1 C
2 A
2 C
DATASET B:
id med start_date end_date
1 1 1/5/13 2/10/16
1 2 3/10/16 4/10/16
2 1 1/8/13 1/10/13
2 2 3/5/18 5/3/18
2 3 5/5/18 6/1/18
This is what I would like the merged dataset to look like:
id value med start_date end_date
1 A 1 1/5/13 2/10/16
1 A 2 3/10/16 4/10/16
1 B 1 1/5/13 2/10/16
1 B 2 3/10/16 4/10/16
1 C 1 1/5/13 2/10/16
1 C 2 3/10/16 4/10/16
2 A 1 1/8/13 1/10/13
2 A 2 3/5/18 5/3/18
2 A 3 5/5/18 6/1/18
2 C 1 1/8/13 1/10/13
2 C 2 3/5/18 5/3/18
2 C 3 5/5/18 6/1/18
Can anyone advise me how I can merge the two datasets so that all rows for each person in dataset B gets added to each row for that same person in dataset A?
Thank you for your help!
@Astounding's code is missing the "from" part of the create table/select statement:
proc sql;
create table want as
select a.value, b.*
from a, b
where a.id = b.id
;
quit;
The easiest way is PROC SQL:
proc sql;
create table want as
select a.value, b.* where a.id = b.id;
quit;
Astounding, thanks for the reply!
Is there any way to actually do this as a data step without creating a table in proc sql? My understanding about proc sql tables is that I can't actually use that table as a dataset. I want to do this merge in a way that creates an actual dataset that I can manipulate further.
I didn't know that (about the create table option).
The code from Astounding doesn't seem to be working when I try it. I think I'm just not using it correctly. Astounding (or anyone else), could you clarify that code for me? For the code a.value, would "a" be referring to the dataset name and "value" referring to the variable name?
In the example datasets that I gave, I would be trying to copy all columns (value, med, start_date, end_date), with the four values for each id getting added to all rows for that corresponding id from the other dataset.
@Astounding's code is missing the "from" part of the create table/select statement:
proc sql;
create table want as
select a.value, b.*
from a, b
where a.id = b.id
;
quit;
Good catch, @Kurt_Bremser
Regarding the other concerns, such as creating a table, being able to use it, all of those should be fine. Just test it on your own test data to verify for yourself.
@SarahW13 wrote:
I didn't know that (about the create table option).
The code from Astounding doesn't seem to be working when I try it. I think I'm just not using it correctly. Astounding (or anyone else), could you clarify that code for me? For the code a.value, would "a" be referring to the dataset name and "value" referring to the variable name?
In the example datasets that I gave, I would be trying to copy all columns (value, med, start_date, end_date), with the four values for each id getting added to all rows for that corresponding id from the other dataset.
If something isn't working please post the non working code and the log at minimum.
I don't think merge will give you the desired result, the best way to do this is by proc sql.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.