How to contrast sets of IDs, another question

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 75
Accepted Solution

How to contrast sets of IDs, another question

The hypothetical data looks like this:

data temp;

input ID1 ID2 ID3 X Y;

datalines;

11     21     .     41     51

11     .        21  42     52

12     22     .     43     53

12     .       22   44     54

12     .       24   45     55

12     24    .      1       2

;

run;

The idea is: For each cluster of ID1, find equal values of ID2 and ID3. Then for each pair of equal values, calculate the difference between Y and X, where Y is taken from corresponding ID3 and X from ID2. For example, in the case ID1=12, we have one where ID2=24 and ID3=24. The difference then will be 55-1=54. The same applies for ID2=ID3=22 (diff = 54-43=11) and ID2=ID3=11 (diff=52-41=11).

There always exists pairs of equal ID2 and ID3 for each ID1. Plus, the pattern of missing and non-missing values of ID2 and ID3 is also in the style of this data; for each ID1, there will be always one missing and one non-missing of ID2 and ID3, whichever the case. That is, there will be no situation like this:

ID1     ID2     ID3     X     Y

13      23       23      47    51


Accepted Solutions
Solution
‎07-17-2013 10:31 PM
Super User
Posts: 9,691

Re: How to contrast sets of IDs, another question

I recommended using dataset for your situation caring about order of obs.

data temp;
input ID1 ID2 ID3 X Y;
datalines;
11     21     .     41     51
11     .        21  42     52
12     22     .     43     53
12     .       22   44     54
12     .       24   45     55
12     24    .      1       2
;
run;
data want;
 merge temp(keep=id1 id2 x where=(id2 is not missing)) temp(keep=id3 y where=(id3 is not missing));
 dif=y-x;
run;

Ksharp

View solution in original post


All Replies
Super Contributor
Posts: 418

Re: How to contrast sets of IDs, another question

data Yvalues(keep=ID1 Id2 Id3 Y) Xvalues(Keep=ID1 ID2 ID3 X);

set temp;

if missing(ID2)=1 then output Yvalues;

else output Xvalues;

run;

proc sql;

create table answer as

select xv.id1

,id2

,id3

,x

,y

,y-x as answer

from Xvalues xv

join Yvalues Yv on yv.id1=xv.id1 and yv.id3=xv.id2;

quit;

run;


Hope that helps. This only works specifically on the conditions you specified

Respected Advisor
Posts: 3,124

Re: How to contrast sets of IDs, another question

For ID1=12, instead of 22-22-24-24 ( two different pairs), is there any chance that you have 22-22-22-22 (two same pairs)? and if you do, how do you want to proceed?

Haikuo

Frequent Contributor
Posts: 75

Re: How to contrast sets of IDs, another question

In a word, no. For each ID1, there is only one pair of ID2-ID3.

Respected Advisor
Posts: 3,124

Re: How to contrast sets of IDs, another question

In that case, I think you already have your answer by

Update: or for a one step solution,

data temp;

input ID1 ID2 ID3 X Y;

datalines;

11 21 . 41 51

11 . 21 42 52

12 22 . 43 53

12 . 22 44 54

12 . 24 45 55

12 24 . 1 2

;

run;

proc sql;

  create table want as

  select *, y-x as dif as diff from temp (drop=id3 y where=(not missing(id2))) a,

temp(drop=id2 x where=(not missing(id3))) b

where a.id1=b.id1

and a.id2=b.id3

;

quit;

Haikuo

Super Contributor
Posts: 418

Re: How to contrast sets of IDs, another question

Nice. Glad I was on the right track.

Also, that is a crazy secondary sql pass Haikou! I didn't know you could do a multiple select with a where statement to join. Thanks for posting that

Frequent Contributor
Posts: 75

Re: How to contrast sets of IDs, another question

Thanks. I guess it's time to have some sql education. That seems to be the gold standard for sorts of question I've made.

Super Contributor
Posts: 418

Re: How to contrast sets of IDs, another question

Actually you don't need sql at all for this question. Note that in my first query I split the data using a data step, and then I choose to merge the data using sql. It's just as efficient to simply use a merge statement to re-join the data (you have to re-name the columns).

However, Hai.Kou's query is only one step, instead of the 4 you would need to merge the data. On large datasets however I think the merge might be more efficient since the join is joining every element together and then filtering it down. That is my guess tho, as I have never seen a sql query like this (I use sql server, which you can't do this type of query in).

But yeah, ANYTHING you can do in sql you can do in a number of data steps, sometimes the sql is just easier to write.

Solution
‎07-17-2013 10:31 PM
Super User
Posts: 9,691

Re: How to contrast sets of IDs, another question

I recommended using dataset for your situation caring about order of obs.

data temp;
input ID1 ID2 ID3 X Y;
datalines;
11     21     .     41     51
11     .        21  42     52
12     22     .     43     53
12     .       22   44     54
12     .       24   45     55
12     24    .      1       2
;
run;
data want;
 merge temp(keep=id1 id2 x where=(id2 is not missing)) temp(keep=id3 y where=(id3 is not missing));
 dif=y-x;
run;

Ksharp

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 263 views
  • 6 likes
  • 4 in conversation