BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
USCSS_Nostromo
Calcite | Level 5

Dear SAS community,

 

I have collected data from a single source simultaneously with three instruments. One instrument provides me with a value ("HR") each minute ("Time"). The other instrument provides me with a value ("stages") every 20 seconds ("epoch"), and the third instrument provides me with a value (RR_s) at varying frequency per 20 seconds (also called "epoch"). I wish to merge these three data sets by some common time variable as well as ID, such that they synchronize as in the following example:

 

Time  HR  ID          epoch stages RR_S

0:02  55  RES005 1         W         0.78523

0:02  55  RES005 1         W         0.85236

0.02  55  RES005 1         W         0.65215

0:03  62  RES005 2         1          0.45695

0.03  62  RES005 2         1          0.56985

0:03  62  RES005 2         1          0.56987

0:04  53  RES005 3         R         0.78958

0:04  53  RES005 3         R         0.89652

0:04  53  RES005 3         R         0.75698

. . .

 

The variable "RR_s" has the most observations per unit time compared to the other two data types ("HR" and "stages"), so in practice there would be many more observations per 20 second epoch than what I have shown above for illustrative purposes.

 

Attached are samples of the data sets I am trying to merge together. The files are named according to the type of data they contain, i.e., "HR", "Stages" and "RR_s".

 

Can anyone lend me a hand? I am running SAS version 9.4. I would really appreciate any help!

 

I have tried conventionally sorting and merging the data in the SAS data step, but this did not work. My syntax is below.

 


Proc sort data = HR; by ID Night time; run;
Proc sort data = Stages; by ID Night epoch; run;
Proc sort data = RR_s; by ID Night epoch; run;

 

data MasterSleepHR;
merge HR Stages RR_s;
by ID Night ;
run;

 

I have also tried merging the files individually with each other in different order but this did not work either. I have also tried to rename the "epoch" variable in the "RR_s" data set to an arbitrary name with the reasoning that it may be conflicting somehow with the variable of the same name in the "HR" data set. This did not solve my problem.

 

I then tried to create variable that would count upward from the start of a data set by 3's by ID, such that I could create a time variable that split up each data set into three units per minute of 20 seconds so that I could make "time" in minute duration synchronize with "epoch" of 20 second duration. This still would not have made the "epoch" time unit of the "RR_s" data synchronize with the other two data types. It did not work anyway. Here is what I wrote:

 

data commonVariable; set HR;
by ID notsorted;
if time ne "." then
do ;
common+3;
if last.ID;
end;

run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
TomKari
Onyx | Level 15

Yes, I thought this might be the case. Fortunately, SAS has a "datetime" data type which is the number of seconds since 1JAN1960. I think that consolidating the date and the time corrects the issue. Take a look and let me know.

 

One other issue. Because I'm using inner joins, cases where there are no epoch values for a time value won't have the data from the HR table. Is this acceptable to you?

 

   Tom

View solution in original post

12 REPLIES 12
TomKari
Onyx | Level 15

I don't find it immediately obvious how these datasets relate. Let me ask a few questions:

 

Do RR_s and Stages have a firm match on ID, night, and epoch? If that's the case, you can use this code to join them (I find PROC SQL is a better solution for this kind of problem than data step merges).

 

proc sql noprint;
	create table Merged1 as
		select r.ID, r.Night, r.epoch, s.stages, r.RR_s
			from RR_S r full outer join Stages s
				on(r.ID = s.ID and r.Night = s.Night and r.epoch = s.epoch)
			order by r.ID, r.Night, r.epoch;
quit;

If that's the case, how do ID, night, and x on HR relate to ID, night, and epoch on the merged file?

 

Using ID of RES005 and night of BN, the first unique values of Time on HR are:
0:00:00
0:02:00
0:03:00
0:04:00
0:05:00
0:06:00
0:07:00
0:08:00
0:09:00
0:10:00

 

while the first unique values of epoch on the merged file are 1 through 10, each with many records. What is the logic to match these variables?

 

Tom

USCSS_Nostromo
Calcite | Level 5

Hi TomKari

 

Thank you for your reply.

 

I have noticed mistakes in the data sets that I posted here initially. I am going to fix these and re-post. When I do so, I will try to answer your questions. I am sorry about this.

 

Best,

 

Ian

 

USCSS_Nostromo
Calcite | Level 5

Hello again,

 

Sorry for the confusion, and thank you for your help.

 

The three data sets were collected at night from a single human subject as he slept.

 

(1) The data set "Stages" contains electroencephalogram (EEG) data. The data are divided into 20 second epochs. Each epoch contains one value, the sleep stage. In the example file attached here, the subject fell asleep at epoch 618 and awoke at epoch 2159.

 

(2) The data set "RR_s" contains electrocardiogram (EKG) data. The data are also divided into 20 second epochs; however, in this case, instead of one value per epoch, there are multiple values per epoch. This is because these data reflect all the individual heart beats during each twenty second epoch.

 

(3) The final data set, "HR", contains photoplethysmogram data. In this data set, there is a single mean heart rate value per 60 seconds by clock time. The clock time is synchronized to the start and end of the subject's sleep episode, i.e., clock time 23:25 aligns with the start of epoch 618, and clock time 7:59 aligns with the start of epoch 2159.

 

I need to merge these three data sets into one master data set, such that the epochs and clock times for each observation correctly synchronize.

 

Attached are files containing example data sets.

 

Please let me know if you have any questions. Thank you again for any help.

 

Sincerely,

 

Ian

TomKari
Onyx | Level 15

So if I'm reading this correctly, the first Stages record for "ID/Night" RES005/BN corresponds to the first HR record for the same "ID/Night", so the "epoch" of 618 matches the "time" of 23:25:00.

 

If this is the case, can we just generate the matching time for each epoch, e.g.

 

time        epoch

23:25:00 618
23:25:00 619
23:25:00 620
23:26:00 621
23:26:00 622
23:26:00 623
...

 

This would definitely give us the ability to match the data.

 

Let me know if this assumption is correct, and then we can tackle the programming. I don't think it'll be terribly complicated.

 

Tom

USCSS_Nostromo
Calcite | Level 5

Hi again Tom,

 

Yes, this assumption is correct!

 

Best,

Ian

 

 

 

 

TomKari
Onyx | Level 15

I've attached a program that's a first attempt at doing what we've been discussing. Some comments:

 

1. I've included the data that you provided, so you can see the results that I got. Note that I changed Channel (changed blank to a _) just to make my life easier.

 

2. There are a couple of places where I'm not sure if a sort would be a good idea or not. I put them in, commented out.

 

3. I haven't done anything with the Date field. If any of the Time values go over midnight, it will be necessary to modify the code to take that into account.

 

4. It's essential that Stages has only one record per combination of ID/night/epoch, otherwise we're into a many-to-many join, which will make a complete mess of things.

 

I strongly suggest you test and verify this very carefully, as I'm not at all familiar with your data concepts.

Tom

USCSS_Nostromo
Calcite | Level 5

Hi Tom,

 

Thank you so much for this!

 

I ran the code you wrote. It works great, but there is just one issue. The data set is limited to epochs 618 - 722 and clock times 23:25 - 23:59 

 

I think this is what you warned about regarding clock times not being able to go past midnight in point number 3 of your last reply. It is almost always the case in my data sets, however, that the clock times go past midnight.

 

I have attached the output as a .csv file. What is your opinion? Can you help further?

 

Best,

 

Ian

TomKari
Onyx | Level 15

Yes, I thought this might be the case. Fortunately, SAS has a "datetime" data type which is the number of seconds since 1JAN1960. I think that consolidating the date and the time corrects the issue. Take a look and let me know.

 

One other issue. Because I'm using inner joins, cases where there are no epoch values for a time value won't have the data from the HR table. Is this acceptable to you?

 

   Tom

USCSS_Nostromo
Calcite | Level 5
Hi Tom,
This sounds very good. It is okay if the HR data are lost when there is no epoch.

Best,
Ian
USCSS_Nostromo
Calcite | Level 5

Hi Tom,

 

This works! I am going to mark this as solved. Thank you once again for all your help.

 

Best,

 

Ian

TomKari
Onyx | Level 15

I'm pleased that I was able to help. It's a very interesting problem!

 

Please take a look at the private note that I sent you.

 

Tom

PGStats
Opal | Level 21

Assuming that the variables time_midpoint_s, elapse and time are in sync, you could merge the measurements with :

 

data rr_s;
infile "&sasforum\datasets\rr_s.csv" dlm="," firstobs=2;
input ID :$8. Night :$8. Channel :$8. Time_Midpoint_s RR_s epoch;
run;

data stages;
infile "&sasforum\datasets\stages.csv" firstobs=2 dlm=",";
input ID :$8. Night :$8. epoch stages :$8. elapse start duration event interval Method :$8.;
run;

data hr;
infile "&sasforum\datasets\hr.csv" dlm="," firstobs=2;
input night :$8. Date :mmddyy. Time :time5. HR ID :$8.;
run;

data all;
set 
    rr_s  (keep=ID night time_midpoint_s rr_s rename=time_midpoint_s=time)
    stages(keep=ID night elapse stages rename=elapse=time)
    hr    (keep=ID night time HR);
format time hhmm.;
run;

Proc sort data = all; by ID Night time; run;

data want;
update all(obs=0) all; by ID night;
output;
run;
PG

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to connect to databases in SAS Viya

Need to connect to databases in SAS Viya? SAS’ David Ghan shows you two methods – via SAS/ACCESS LIBNAME and SAS Data Connector SASLIBS – in this video.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 12 replies
  • 2861 views
  • 0 likes
  • 3 in conversation