BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
NonSleeper
Quartz | Level 8

The hypothetical data looks like this:

data temp;

input ID1 ID2 ID3 X Y;

datalines;

11     21     .     41     51

11     .        21  42     52

12     22     .     43     53

12     .       22   44     54

12     .       24   45     55

12     24    .      1       2

;

run;

The idea is: For each cluster of ID1, find equal values of ID2 and ID3. Then for each pair of equal values, calculate the difference between Y and X, where Y is taken from corresponding ID3 and X from ID2. For example, in the case ID1=12, we have one where ID2=24 and ID3=24. The difference then will be 55-1=54. The same applies for ID2=ID3=22 (diff = 54-43=11) and ID2=ID3=11 (diff=52-41=11).

There always exists pairs of equal ID2 and ID3 for each ID1. Plus, the pattern of missing and non-missing values of ID2 and ID3 is also in the style of this data; for each ID1, there will be always one missing and one non-missing of ID2 and ID3, whichever the case. That is, there will be no situation like this:

ID1     ID2     ID3     X     Y

13      23       23      47    51

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

I recommended using dataset for your situation caring about order of obs.

data temp;
input ID1 ID2 ID3 X Y;
datalines;
11     21     .     41     51
11     .        21  42     52
12     22     .     43     53
12     .       22   44     54
12     .       24   45     55
12     24    .      1       2
;
run;
data want;
 merge temp(keep=id1 id2 x where=(id2 is not missing)) temp(keep=id3 y where=(id3 is not missing));
 dif=y-x;
run;

Ksharp

View solution in original post

8 REPLIES 8
Anotherdream
Quartz | Level 8

data Yvalues(keep=ID1 Id2 Id3 Y) Xvalues(Keep=ID1 ID2 ID3 X);

set temp;

if missing(ID2)=1 then output Yvalues;

else output Xvalues;

run;

proc sql;

create table answer as

select xv.id1

,id2

,id3

,x

,y

,y-x as answer

from Xvalues xv

join Yvalues Yv on yv.id1=xv.id1 and yv.id3=xv.id2;

quit;

run;


Hope that helps. This only works specifically on the conditions you specified

Haikuo
Onyx | Level 15

For ID1=12, instead of 22-22-24-24 ( two different pairs), is there any chance that you have 22-22-22-22 (two same pairs)? and if you do, how do you want to proceed?

Haikuo

NonSleeper
Quartz | Level 8

In a word, no. For each ID1, there is only one pair of ID2-ID3.

Haikuo
Onyx | Level 15

In that case, I think you already have your answer by

Update: or for a one step solution,

data temp;

input ID1 ID2 ID3 X Y;

datalines;

11 21 . 41 51

11 . 21 42 52

12 22 . 43 53

12 . 22 44 54

12 . 24 45 55

12 24 . 1 2

;

run;

proc sql;

  create table want as

  select *, y-x as dif as diff from temp (drop=id3 y where=(not missing(id2))) a,

temp(drop=id2 x where=(not missing(id3))) b

where a.id1=b.id1

and a.id2=b.id3

;

quit;

Haikuo

Anotherdream
Quartz | Level 8

Nice. Glad I was on the right track.

Also, that is a crazy secondary sql pass Haikou! I didn't know you could do a multiple select with a where statement to join. Thanks for posting that

NonSleeper
Quartz | Level 8

Thanks. I guess it's time to have some sql education. That seems to be the gold standard for sorts of question I've made.

Anotherdream
Quartz | Level 8

Actually you don't need sql at all for this question. Note that in my first query I split the data using a data step, and then I choose to merge the data using sql. It's just as efficient to simply use a merge statement to re-join the data (you have to re-name the columns).

However, Hai.Kou's query is only one step, instead of the 4 you would need to merge the data. On large datasets however I think the merge might be more efficient since the join is joining every element together and then filtering it down. That is my guess tho, as I have never seen a sql query like this (I use sql server, which you can't do this type of query in).

But yeah, ANYTHING you can do in sql you can do in a number of data steps, sometimes the sql is just easier to write.

Ksharp
Super User

I recommended using dataset for your situation caring about order of obs.

data temp;
input ID1 ID2 ID3 X Y;
datalines;
11     21     .     41     51
11     .        21  42     52
12     22     .     43     53
12     .       22   44     54
12     .       24   45     55
12     24    .      1       2
;
run;
data want;
 merge temp(keep=id1 id2 x where=(id2 is not missing)) temp(keep=id3 y where=(id3 is not missing));
 dif=y-x;
run;

Ksharp

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 672 views
  • 6 likes
  • 4 in conversation