Hello!
I am using SAS ver9.3 and I am attempting to match observations from two different variables: Subject1 and Subject2. Subject2 doesn't have as many observations as Subject1 due to necessary duplicates and that is causing the issue. When these two variables are in the same dataset using a simple merge, it ends up looking like this:
Subject1 Subject2
A-0001-01 0001-01
A-0001-01 0002-02
A-0001-01 0003-03
B-0002-02
B-0002-02
C-0003-03
C-0003-03
C-0003-03
C-0003-03
Instead, this is what I'm interested in:
Subject1 Subject2
A-0001-01 0001-01
A-0001-01
A-0001-01
B-0002-02 0002-02
B-0002-02
C-0003-03 0003-03
C-0003-03
C-0003-03
C-0003-03
I am new to SAS programming, so I apologize if there might be a simple fix that I'm not realizing. Thank you!
Well I was assuming you wanted to combine them based on a key, but if they are the same size and you just want them to be in the same dataset in the same order:
data combined;
merge subject1 subject2;
run;
Should do it. However if you want to join them with the pseudo key I mentioned above, maybe with additional values, the values would all be populated, unless you create a unique key that you can then join on.
Alternatlvely if you are pressed for time you can use a pretty inefficient “first”
Data final;
Set joined;
By subject1;
If not first.subject1 then subject2 = “”;
run;
That would set them all to null.
To set a key you could just run this on subject1:
Data subject1;
Set subject1;
By subject1;
If first.subject1 then key = substr(subject1, 3,7);
run;
Then just join on the new key value.
Using a proc sql join you could pull the two together fairly simply. Essentially you need a standard key between the two to join on. Proc sql lets you cheat a bit using an on statement. Otherwise you'd need to create something two two can join on. One additional benefit is that joins don't require a sort compared to a merge.
proc sql; create table joined as select a.*, b.* from subject1 a left join subject2 b
on substr(a.a, 3,7) = b.b;
Which would get you:
Subject1 | Subject2 |
A-0001-01 | 0001-01 |
A-0001-01 | 0001-01 |
A-0001-01 | 0001-01 |
B-0002-02 | 0002-02 |
B-0002-02 | 0002-02 |
C-0003-03 | 0003-03 |
C-0003-03 | 0003-03 |
C-0003-03 | 0003-03 |
C-0003-03 | 0003-03 |
Hi Clifton, thank you for your reply. I went ahead and tried the code, but is there a way to keep the blank obervations? Like this:
Subject1 | Subject2 |
A-0001-01 | 0001-01 |
A-0001-01 | |
A-0001-01 | |
B-0002-02 | 0002-02 |
B-0002-02 | |
C-0003-03 | 0003-03 |
C-0003-03 | |
C-0003-03 | |
C-0003-03 |
Well I was assuming you wanted to combine them based on a key, but if they are the same size and you just want them to be in the same dataset in the same order:
data combined;
merge subject1 subject2;
run;
Should do it. However if you want to join them with the pseudo key I mentioned above, maybe with additional values, the values would all be populated, unless you create a unique key that you can then join on.
Alternatlvely if you are pressed for time you can use a pretty inefficient “first”
Data final;
Set joined;
By subject1;
If not first.subject1 then subject2 = “”;
run;
That would set them all to null.
To set a key you could just run this on subject1:
Data subject1;
Set subject1;
By subject1;
If first.subject1 then key = substr(subject1, 3,7);
run;
Then just join on the new key value.
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.