How to Match Observations from Two Variables

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

How to Match Observations from Two Variables

[ Edited ]

Hello!

 

I am using SAS ver9.3 and I am attempting to match observations from two different variables: Subject1 and Subject2. Subject2 doesn't have as many observations as Subject1 due to necessary duplicates and that is causing the issue. When these two variables are in the same dataset using a simple merge, it ends up looking like this:

 

Subject1               Subject2

A-0001-01              0001-01

A-0001-01              0002-02

A-0001-01              0003-03

B-0002-02

B-0002-02

C-0003-03

C-0003-03

C-0003-03

C-0003-03

 

 

Instead, this is what I'm interested in:

 

Subject1               Subject2

A-0001-01              0001-01

A-0001-01             

A-0001-01             

B-0002-02              0002-02

B-0002-02

C-0003-03              0003-03

C-0003-03

C-0003-03

C-0003-03

 

I am new to SAS programming, so I apologize if there might be a simple fix that I'm not realizing. Thank you!


Accepted Solutions
Solution
‎05-15-2017 02:19 PM
Occasional Contributor
Posts: 9

Re: How to Match Observations from Two Variables

Well I was assuming you wanted to combine them based on a key, but if they are the same size and you just want them to be in the same dataset in the same order:

 

data combined;

merge subject1 subject2;

run;

 

Should do it. However if you want to join them with the pseudo key I mentioned above, maybe with additional values, the values would all be populated, unless you create a unique key that you can then join on.

 

Alternatlvely if you are pressed for time you can use a pretty inefficient “first”

 

Data final;

            Set joined;

            By subject1;

            If not first.subject1 then subject2 = “”;

run;

 

That would set them all to null.

To set a key you could just run this on subject1:

 

Data subject1;

            Set subject1;

            By subject1;

            If first.subject1 then key = substr(subject1, 3,7);

run;

 

Then just join on the new key value.

View solution in original post


All Replies
Occasional Contributor
Posts: 9

Re: How to Match Observations from Two Variables

Using a proc sql join you could pull the two together fairly simply. Essentially you need a standard key between the two to join on. Proc sql lets you cheat a bit using an on statement. Otherwise you'd need to create something two two can join on. One additional benefit is that joins don't require a sort compared to a merge. 

 

proc sql; create table joined as select a.*, b.* from subject1 a left join subject2 b
on substr(a.a, 3,7) = b.b;

 

Which would get you:

 

Subject1Subject2
A-0001-010001-01
A-0001-01             0001-01
A-0001-01             0001-01
B-0002-020002-02
B-0002-020002-02
C-0003-030003-03
C-0003-030003-03
C-0003-030003-03
C-0003-030003-03
Occasional Contributor
Posts: 14

Re: How to Match Observations from Two Variables

Hi Clifton, thank you for your reply. I went ahead and tried the code, but is there a way to keep the blank obervations? Like this:

 

Subject1Subject2
A-0001-010001-01
A-0001-01              
A-0001-01              
B-0002-020002-02
B-0002-02 
C-0003-030003-03
C-0003-03 
C-0003-03 
C-0003-03 
Solution
‎05-15-2017 02:19 PM
Occasional Contributor
Posts: 9

Re: How to Match Observations from Two Variables

Well I was assuming you wanted to combine them based on a key, but if they are the same size and you just want them to be in the same dataset in the same order:

 

data combined;

merge subject1 subject2;

run;

 

Should do it. However if you want to join them with the pseudo key I mentioned above, maybe with additional values, the values would all be populated, unless you create a unique key that you can then join on.

 

Alternatlvely if you are pressed for time you can use a pretty inefficient “first”

 

Data final;

            Set joined;

            By subject1;

            If not first.subject1 then subject2 = “”;

run;

 

That would set them all to null.

To set a key you could just run this on subject1:

 

Data subject1;

            Set subject1;

            By subject1;

            If first.subject1 then key = substr(subject1, 3,7);

run;

 

Then just join on the new key value.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 211 views
  • 2 likes
  • 2 in conversation