Hello!
I am using SAS ver9.3 and I am attempting to match observations from two different variables: Subject1 and Subject2. Subject2 doesn't have as many observations as Subject1 due to necessary duplicates and that is causing the issue. When these two variables are in the same dataset using a simple merge, it ends up looking like this:
Subject1 Subject2
A-0001-01 0001-01
A-0001-01 0002-02
A-0001-01 0003-03
B-0002-02
B-0002-02
C-0003-03
C-0003-03
C-0003-03
C-0003-03
Instead, this is what I'm interested in:
Subject1 Subject2
A-0001-01 0001-01
A-0001-01
A-0001-01
B-0002-02 0002-02
B-0002-02
C-0003-03 0003-03
C-0003-03
C-0003-03
C-0003-03
I am new to SAS programming, so I apologize if there might be a simple fix that I'm not realizing. Thank you!
Well I was assuming you wanted to combine them based on a key, but if they are the same size and you just want them to be in the same dataset in the same order:
data combined;
merge subject1 subject2;
run;
Should do it. However if you want to join them with the pseudo key I mentioned above, maybe with additional values, the values would all be populated, unless you create a unique key that you can then join on.
Alternatlvely if you are pressed for time you can use a pretty inefficient “first”
Data final;
Set joined;
By subject1;
If not first.subject1 then subject2 = “”;
run;
That would set them all to null.
To set a key you could just run this on subject1:
Data subject1;
Set subject1;
By subject1;
If first.subject1 then key = substr(subject1, 3,7);
run;
Then just join on the new key value.
Using a proc sql join you could pull the two together fairly simply. Essentially you need a standard key between the two to join on. Proc sql lets you cheat a bit using an on statement. Otherwise you'd need to create something two two can join on. One additional benefit is that joins don't require a sort compared to a merge.
proc sql; create table joined as select a.*, b.* from subject1 a left join subject2 b
on substr(a.a, 3,7) = b.b;
Which would get you:
Subject1 | Subject2 |
A-0001-01 | 0001-01 |
A-0001-01 | 0001-01 |
A-0001-01 | 0001-01 |
B-0002-02 | 0002-02 |
B-0002-02 | 0002-02 |
C-0003-03 | 0003-03 |
C-0003-03 | 0003-03 |
C-0003-03 | 0003-03 |
C-0003-03 | 0003-03 |
Hi Clifton, thank you for your reply. I went ahead and tried the code, but is there a way to keep the blank obervations? Like this:
Subject1 | Subject2 |
A-0001-01 | 0001-01 |
A-0001-01 | |
A-0001-01 | |
B-0002-02 | 0002-02 |
B-0002-02 | |
C-0003-03 | 0003-03 |
C-0003-03 | |
C-0003-03 | |
C-0003-03 |
Well I was assuming you wanted to combine them based on a key, but if they are the same size and you just want them to be in the same dataset in the same order:
data combined;
merge subject1 subject2;
run;
Should do it. However if you want to join them with the pseudo key I mentioned above, maybe with additional values, the values would all be populated, unless you create a unique key that you can then join on.
Alternatlvely if you are pressed for time you can use a pretty inefficient “first”
Data final;
Set joined;
By subject1;
If not first.subject1 then subject2 = “”;
run;
That would set them all to null.
To set a key you could just run this on subject1:
Data subject1;
Set subject1;
By subject1;
If first.subject1 then key = substr(subject1, 3,7);
run;
Then just join on the new key value.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.