Dear SAS community,
I have collected data from a single source simultaneously with three instruments. One instrument provides me with a value ("HR") each minute ("Time"). The other instrument provides me with a value ("stages") every 20 seconds ("epoch"), and the third instrument provides me with a value (RR_s) at varying frequency per 20 seconds (also called "epoch"). I wish to merge these three data sets by some common time variable as well as ID, such that they synchronize as in the following example:
Time HR ID epoch stages RR_S
0:02 55 RES005 1 W 0.78523
0:02 55 RES005 1 W 0.85236
0.02 55 RES005 1 W 0.65215
0:03 62 RES005 2 1 0.45695
0.03 62 RES005 2 1 0.56985
0:03 62 RES005 2 1 0.56987
0:04 53 RES005 3 R 0.78958
0:04 53 RES005 3 R 0.89652
0:04 53 RES005 3 R 0.75698
. . .
The variable "RR_s" has the most observations per unit time compared to the other two data types ("HR" and "stages"), so in practice there would be many more observations per 20 second epoch than what I have shown above for illustrative purposes.
Attached are samples of the data sets I am trying to merge together. The files are named according to the type of data they contain, i.e., "HR", "Stages" and "RR_s".
Can anyone lend me a hand? I am running SAS version 9.4. I would really appreciate any help!
I have tried conventionally sorting and merging the data in the SAS data step, but this did not work. My syntax is below.
Proc sort data = HR; by ID Night time; run;
Proc sort data = Stages; by ID Night epoch; run;
Proc sort data = RR_s; by ID Night epoch; run;
data MasterSleepHR;
merge HR Stages RR_s;
by ID Night ;
run;
I have also tried merging the files individually with each other in different order but this did not work either. I have also tried to rename the "epoch" variable in the "RR_s" data set to an arbitrary name with the reasoning that it may be conflicting somehow with the variable of the same name in the "HR" data set. This did not solve my problem.
I then tried to create variable that would count upward from the start of a data set by 3's by ID, such that I could create a time variable that split up each data set into three units per minute of 20 seconds so that I could make "time" in minute duration synchronize with "epoch" of 20 second duration. This still would not have made the "epoch" time unit of the "RR_s" data synchronize with the other two data types. It did not work anyway. Here is what I wrote:
data commonVariable; set HR;
by ID notsorted;
if time ne "." then
do ;
common+3;
if last.ID;
end;
run;
Yes, I thought this might be the case. Fortunately, SAS has a "datetime" data type which is the number of seconds since 1JAN1960. I think that consolidating the date and the time corrects the issue. Take a look and let me know.
One other issue. Because I'm using inner joins, cases where there are no epoch values for a time value won't have the data from the HR table. Is this acceptable to you?
Tom
I don't find it immediately obvious how these datasets relate. Let me ask a few questions:
Do RR_s and Stages have a firm match on ID, night, and epoch? If that's the case, you can use this code to join them (I find PROC SQL is a better solution for this kind of problem than data step merges).
proc sql noprint; create table Merged1 as select r.ID, r.Night, r.epoch, s.stages, r.RR_s from RR_S r full outer join Stages s on(r.ID = s.ID and r.Night = s.Night and r.epoch = s.epoch) order by r.ID, r.Night, r.epoch; quit;
If that's the case, how do ID, night, and x on HR relate to ID, night, and epoch on the merged file?
Using ID of RES005 and night of BN, the first unique values of Time on HR are:
0:00:00
0:02:00
0:03:00
0:04:00
0:05:00
0:06:00
0:07:00
0:08:00
0:09:00
0:10:00
while the first unique values of epoch on the merged file are 1 through 10, each with many records. What is the logic to match these variables?
Tom
Hi TomKari
Thank you for your reply.
I have noticed mistakes in the data sets that I posted here initially. I am going to fix these and re-post. When I do so, I will try to answer your questions. I am sorry about this.
Best,
Ian
Hello again,
Sorry for the confusion, and thank you for your help.
The three data sets were collected at night from a single human subject as he slept.
(1) The data set "Stages" contains electroencephalogram (EEG) data. The data are divided into 20 second epochs. Each epoch contains one value, the sleep stage. In the example file attached here, the subject fell asleep at epoch 618 and awoke at epoch 2159.
(2) The data set "RR_s" contains electrocardiogram (EKG) data. The data are also divided into 20 second epochs; however, in this case, instead of one value per epoch, there are multiple values per epoch. This is because these data reflect all the individual heart beats during each twenty second epoch.
(3) The final data set, "HR", contains photoplethysmogram data. In this data set, there is a single mean heart rate value per 60 seconds by clock time. The clock time is synchronized to the start and end of the subject's sleep episode, i.e., clock time 23:25 aligns with the start of epoch 618, and clock time 7:59 aligns with the start of epoch 2159.
I need to merge these three data sets into one master data set, such that the epochs and clock times for each observation correctly synchronize.
Attached are files containing example data sets.
Please let me know if you have any questions. Thank you again for any help.
Sincerely,
Ian
So if I'm reading this correctly, the first Stages record for "ID/Night" RES005/BN corresponds to the first HR record for the same "ID/Night", so the "epoch" of 618 matches the "time" of 23:25:00.
If this is the case, can we just generate the matching time for each epoch, e.g.
time epoch
23:25:00 618
23:25:00 619
23:25:00 620
23:26:00 621
23:26:00 622
23:26:00 623
...
This would definitely give us the ability to match the data.
Let me know if this assumption is correct, and then we can tackle the programming. I don't think it'll be terribly complicated.
Tom
Hi again Tom,
Yes, this assumption is correct!
Best,
Ian
I've attached a program that's a first attempt at doing what we've been discussing. Some comments:
1. I've included the data that you provided, so you can see the results that I got. Note that I changed Channel (changed blank to a _) just to make my life easier.
2. There are a couple of places where I'm not sure if a sort would be a good idea or not. I put them in, commented out.
3. I haven't done anything with the Date field. If any of the Time values go over midnight, it will be necessary to modify the code to take that into account.
4. It's essential that Stages has only one record per combination of ID/night/epoch, otherwise we're into a many-to-many join, which will make a complete mess of things.
I strongly suggest you test and verify this very carefully, as I'm not at all familiar with your data concepts.
Tom
Hi Tom,
Thank you so much for this!
I ran the code you wrote. It works great, but there is just one issue. The data set is limited to epochs 618 - 722 and clock times 23:25 - 23:59
I think this is what you warned about regarding clock times not being able to go past midnight in point number 3 of your last reply. It is almost always the case in my data sets, however, that the clock times go past midnight.
I have attached the output as a .csv file. What is your opinion? Can you help further?
Best,
Ian
Yes, I thought this might be the case. Fortunately, SAS has a "datetime" data type which is the number of seconds since 1JAN1960. I think that consolidating the date and the time corrects the issue. Take a look and let me know.
One other issue. Because I'm using inner joins, cases where there are no epoch values for a time value won't have the data from the HR table. Is this acceptable to you?
Tom
Hi Tom,
This works! I am going to mark this as solved. Thank you once again for all your help.
Best,
Ian
I'm pleased that I was able to help. It's a very interesting problem!
Please take a look at the private note that I sent you.
Tom
Assuming that the variables time_midpoint_s, elapse and time are in sync, you could merge the measurements with :
data rr_s;
infile "&sasforum\datasets\rr_s.csv" dlm="," firstobs=2;
input ID :$8. Night :$8. Channel :$8. Time_Midpoint_s RR_s epoch;
run;
data stages;
infile "&sasforum\datasets\stages.csv" firstobs=2 dlm=",";
input ID :$8. Night :$8. epoch stages :$8. elapse start duration event interval Method :$8.;
run;
data hr;
infile "&sasforum\datasets\hr.csv" dlm="," firstobs=2;
input night :$8. Date :mmddyy. Time :time5. HR ID :$8.;
run;
data all;
set
rr_s (keep=ID night time_midpoint_s rr_s rename=time_midpoint_s=time)
stages(keep=ID night elapse stages rename=elapse=time)
hr (keep=ID night time HR);
format time hhmm.;
run;
Proc sort data = all; by ID Night time; run;
data want;
update all(obs=0) all; by ID night;
output;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.
Find more tutorials on the SAS Users YouTube channel.