Greetings, I hope to get help on the following issue:
I have a dataset arranged in a long format, with repeated obervations for each subject; something like this:
PATIENT ID VISIT DATE VAR X
1 1 01/01/2005 ...
1 2 01/07/2005 ...
1 3 ... ...
2 1 ... ...
2 2 :.. ...
2 3 ... ...
Then I need to merge it with another dataset which has a similar structure but contains variable Y. I can't use the "merge" statement in a data step using the patient ID as the key because it is repeated and I can't use VISIT as a key neither because it's not unique. Does anyone has any idea on how to merge them?
Thank you,
Manuel
Can't you just use "patient_id visit" as the by variable?
Otherwise, it would help if you provide examples of both datasets, the code you are trying, and what you want to accomplish.
Can't you just use "patient_id visit" as the by variable?
Otherwise, it would help if you provide examples of both datasets, the code you are trying, and what you want to accomplish.
I didn't know that I could use more than one variable in the BY statement. Indeed the "unique" key is the combination of both PATIENT_ID and VISIT variables.
I will try it to see if it works. Otherwise, I will give more specific info on my datasets and code.
EDIT: it's working. Thank you so much.
Manuel
Yes you definitely can include multiple variables in a by statement. Let the Forum know if that corrects your problem.
Thank you.
Is there a way to create a variable that ranks each observation in order of time? In example, from the following dataset:
patient_id date_of_measurement value rank
1 40608 1.5 1
1 40707 1.8 2
2 31609 2.0 1
2 32705 2.0 2
2 40606 2.2 3
The "rank" variable measures the temporal order of measurements within each patient; it is absent in the original dataset and should be created.
Many thanks.
I think you are only asking for something like:
data have;
input patient_id date_of_measurement value;
cards;
1 40608 1.5
1 40707 1.8
2 31609 2.0
2 32705 2.0
2 40606 2.2
;
proc sort data=have;
by patient_id date_of_measurement;
run;
data want;
set have;
by patient_id;
if first.patient_id then rank=1;
else rank+1;
run;
Exactly. Another way would be PROC RANK, but yours can be embedded in my data step, so I will go for this.
Thank you.
proc rank would give you ties on same date entries. Would you want that?
I shouldn't have such data in my dataset. But then again, it might serve as a double check if I run PROC RANK and then look for ties within the same subject.
Even if performing the procedure inside the data step just seems more elegant to me (not a big issue, I know).
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.