Solved: Merging data in a long format

Iron_grief · Posted 10-16-2011 07:21 PM

Greetings, I hope to get help on the following issue:

I have a dataset arranged in a long format, with repeated obervations for each subject; something like this:

PATIENT ID VISIT DATE VAR X

1 1 01/01/2005 ...

1 2 01/07/2005 ...

1 3 ... ...

2 1 ... ...

2 2 :.. ...

2 3 ... ...

Then I need to merge it with another dataset which has a similar structure but contains variable Y. I can't use the "merge" statement in a data step using the patient ID as the key because it is repeated and I can't use VISIT as a key neither because it's not unique. Does anyone has any idea on how to merge them?

Thank you,

Manuel

art297 · Posted 10-16-2011 07:39 PM

Can't you just use "patient_id visit" as the by variable?

Otherwise, it would help if you provide examples of both datasets, the code you are trying, and what you want to accomplish.

View solution in original post

art297 · Posted 10-16-2011 07:39 PM

Can't you just use "patient_id visit" as the by variable?

Otherwise, it would help if you provide examples of both datasets, the code you are trying, and what you want to accomplish.

Iron_grief · Posted 10-16-2011 08:03 PM

I didn't know that I could use more than one variable in the BY statement. Indeed the "unique" key is the combination of both PATIENT_ID and VISIT variables.

I will try it to see if it works. Otherwise, I will give more specific info on my datasets and code.

EDIT: it's working. Thank you so much.

Manuel

art297 · Posted 10-16-2011 08:06 PM

Yes you definitely can include multiple variables in a by statement. Let the Forum know if that corrects your problem.

Iron_grief · Posted 10-19-2011 01:21 PM

Thank you.

Is there a way to create a variable that ranks each observation in order of time? In example, from the following dataset:

patient_id date_of_measurement value rank

1 40608 1.5 1

1 40707 1.8 2

2 31609 2.0 1

2 32705 2.0 2

2 40606 2.2 3

The "rank" variable measures the temporal order of measurements within each patient; it is absent in the original dataset and should be created.

Many thanks.

art297 · Posted 10-19-2011 01:29 PM

I think you are only asking for something like:

data have;

input patient_id date_of_measurement value;

cards;

1 40608 1.5

1 40707 1.8

2 31609 2.0

2 32705 2.0

2 40606 2.2

;

proc sort data=have;

by patient_id date_of_measurement;

run;

data want;

set have;

by patient_id;

if first.patient_id then rank=1;

else rank+1;

run;

Iron_grief · Posted 10-19-2011 02:12 PM

Exactly. Another way would be PROC RANK, but yours can be embedded in my data step, so I will go for this.

Thank you.

art297 · Posted 10-19-2011 02:30 PM

proc rank would give you ties on same date entries. Would you want that?

Iron_grief · Posted 10-19-2011 03:03 PM

I shouldn't have such data in my dataset. But then again, it might serve as a double check if I run PROC RANK and then look for ties within the same subject.

Even if performing the procedure inside the data step just seems more elegant to me (not a big issue, I know).

Merging data in a long format

Merging data in a long format

Merging data in a long format

Re: Merging data in a long format

Merging data in a long format

Re: Merging data in a long format

Re: Merging data in a long format

Re: Merging data in a long format

Re: Merging data in a long format

Re: Merging data in a long format

Register Today!

SAS Training: Just a Click Away