I have two datasets: one (Dataset1) with the measures start_datetime and end_datetime:
start_datetime end_datetime
12345 12349
12355 12370
...
and one (Dataset2) with the measures start_datetime and datetime, which are identical. This one contains all datetimes within the timespan I'm interested in.
start_datetime end_datetime
12345 12345
12346 12346
...
I want to merge them on start_time like so:
and get
datetime start_datetime end_datetime
12344 12345 12349
12345 12345 12349
12346 12345 12349
12347 12345 12349
12348 12345 12349
12349 12345 12349
12350 12355 12370
...
I can then say
if datetime >=start_datetime and datetime < end_datetime then indicator=1; else indicator=0; run;
However, the merge doesn't work. If I merge on start_time I get
datetime start_datetime end_datetime
12344 12345 .
12345 12345 12349
12346 12345 .
12347 12345 .
12348 12345 .
12349 12345 .
12350 12355 .
...
This is the same problem described here, but the solution doesn't work.
Some things are just too difficult to visualize...
What program did you run to try to "merge" these together?
Why should 12344 be a match with the start of 12345 and end of 12349?
Why should 12350 be a match with the start of 12355 an end of 12370?
What are you ultimately trying to accomplish here? There might be a much simpler way to get there.
Top to bottom:
>What program did you run to try to "merge" these together?
I have another dataset, datasetA, that has one observation for each datetime in the period of interest. The code I ran was:
proc sort data=dataset1 */*/*/the one I have*/*/;; by start_time; run;
data datasetA; set datasetA; start_time=datetime; run;
proc sort data=datasetA; by start_time; run;
data datasetB; merge dataset1 datasetA; by start_time; run;
data datasetB; set datasetB;
if datetime>= start_time and datetime <end_time then activity=1; else activity=0; run;
>Why should 12344 be a match with the start of 12345 and end of 12349?
>Why should 12350 be a match with the start of 12355 an end of 12370?
They shouldn't. That's a glitch.
>What are you ultimately trying to accomplish here? There might be a much simpler way to get there.
No doubt there is. What I want to do is be able to run the last two lines of code:
data datasetB; set datasetB;
if datetime>= start_time and datetime <end_time then activity=1; else activity=0; run;
and get a 1 or 0 depending on whether that time falls within the given span.
If I'm reading this correctly, you need to compare each datetime to a set of intervals (not just one interval). Your new variable should be 1 when the datetime falls within any of the intervals.
Here's a way to do that which creates your new variable as character "1" or "0". I'll mention more about that at the end.
Start by creating a format from your sets of intervals:
data intervals;
set dataset1 end=done;
start = start_time;
end = end_time;
label='1';
fmtname = 'interval';
output;
if done;
hlo='O';
label='0';
output;
run;
This creates a format that identifies all the intervals. (It will also double-check and give you and error if any of the intervals overlap so you can clean them up if necessary.) There is the possibility that you would use this statement instead:
end = end_time - 1;
Your comparison uses < end_time rather than <= end_time and I can't really tell if that is intentional or not.
At any rate, create a format to identify the intervals:
proc format cntlin=intervals;
run;
Then use the format to identify the individual activities:
data want;
set datasetA;
activity = put(datetime, interval.);
run;
Technically, if you really need your new variable as numeric instead of character "1" and "0", it would be possible to create an informat instead of a format by changing one of the variables within the INTERVALS data set.
In addition to the questions put by @Astounding, exactly what do you mean by "but the solution doesn't work"? What did you get from the program that was different than what you expected?
I replied to @Astounding. But what I meant was that I started with this:
start_time end_time
12345 12347
12348 12355
...
and wanted to get this:
time activity
12344 0
12345 1
12346 1
12347 0
12348 1
....
but instead got this:
time activity
12344 .
12345 1
12346 .
12347 .
12348 1
....
I thought the previously described MERGE would do it, but is there a simpler way?
I had asked four questions.
You answered none of them.
But you changed the variables in your data sets so they no longer relate to your original post.
I can wait.
Sorry for the confusion. The original dataset had only datetime and minutes, from which I got start_time and end_time. Original post updated.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.