BookmarkSubscribeRSS Feed
7maiasmith
Calcite | Level 5

I have two datasets: one (Dataset1) with the measures start_datetime and end_datetime:

start_datetime     end_datetime

12345                  12349

12355                  12370

...

 

and one (Dataset2) with the measures start_datetime and datetime, which are identical.  This one contains all datetimes within the timespan I'm interested in.

start_datetime     end_datetime

12345                  12345

12346                  12346

...

 

I want to merge them on start_time like so:

 

and get

datetime     start_datetime     end_datetime

12344        12345                  12349

12345        12345                  12349

12346        12345                  12349

12347        12345                  12349

12348        12345                  12349

12349        12345                  12349

12350        12355                  12370

...

I can then say

if datetime >=start_datetime and datetime < end_datetime then indicator=1; else indicator=0; run;

 

However, the merge doesn't work.  If I merge on start_time I get

datetime     start_datetime     end_datetime

12344        12345                  .

12345        12345                  12349

12346        12345                  .

12347        12345                  .

12348        12345                  .

12349        12345                  .

12350        12355                  .

...

 

This is the same problem described here, but the solution doesn't work.

https://communities.sas.com/t5/SAS-Programming/Merging-one-to-many-creates-missing-values/td-p/24602...

7 REPLIES 7
Astounding
PROC Star

Some things are just too difficult to visualize...

 

What program did you run to try to "merge" these together?

 

Why should 12344 be a match with the start of 12345 and end of 12349?

 

Why should 12350 be a match with the start of 12355 an end of 12370?

 

What are you ultimately trying to accomplish here?  There might be a much simpler way to get there.

7maiasmith
Calcite | Level 5

 

 

Top to bottom:

>What program did you run to try to "merge" these together? 

I have another dataset, datasetA, that has one observation for each datetime in the period of interest.  The code I ran was:

proc sort data=dataset1 */*/*/the one I have*/*/;; by start_time; run;
data datasetA; set datasetA; start_time=datetime; run;
proc sort data=datasetA; by start_time; run;

data datasetB; merge dataset1 datasetA; by start_time; run;

data datasetB; set datasetB;
if datetime>= start_time and datetime <end_time then activity=1; else activity=0; run;

 

>Why should 12344 be a match with the start of 12345 and end of 12349?

>Why should 12350 be a match with the start of 12355 an end of 12370?

They shouldn't.  That's a glitch.

 

>What are you ultimately trying to accomplish here?  There might be a much simpler way to get there.

No doubt there is.  What I want to do is be able to run the last two lines of code:

data datasetB; set datasetB;
if datetime>= start_time and datetime <end_time then activity=1; else activity=0; run;

and get a 1 or 0 depending on whether that time falls within the given span.

 

Astounding
PROC Star

If I'm reading this correctly, you need to compare each datetime to a set of intervals (not just one interval).  Your new variable should be 1 when the datetime falls within any of the intervals.

 

Here's a way to do that which creates your new variable as character "1" or "0".  I'll mention more about that at the end.

 

Start by creating a format from your sets of intervals:

 

data intervals;

set dataset1 end=done;

start = start_time;

end = end_time;   

label='1';

fmtname = 'interval';

output;

if done;

hlo='O';

label='0';

output;

run;

 

This creates a format that identifies all the intervals.  (It will also double-check and give you and error if any of the intervals overlap so you can clean them up if necessary.)  There is the possibility that you would use this statement instead:

 

end = end_time - 1;

 

Your comparison uses < end_time rather than <= end_time and I can't really tell if that is intentional or not.

 

At any rate, create a format to identify the intervals:

 

proc format cntlin=intervals;

run;

 

Then use the format to identify the individual activities:

 

data want;

set datasetA;

activity = put(datetime, interval.);

run;

 

Technically, if you really need your new variable as numeric instead of character "1" and "0", it would be possible to create an informat instead of a format by changing one of the variables within the INTERVALS data set.

mkeintz
PROC Star

In addition to the questions put by @Astounding, exactly what do you mean by "but the solution doesn't work"?   What did you get from the program that was different than what you expected?

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
7maiasmith
Calcite | Level 5

I replied to @Astounding.  But what I meant was that I started with this:

start_time           end_time

12345                12347  

12348                12355

...

 

and wanted to get this:

time                activity

12344               0

12345               1

12346               1

12347               0

12348               1

....

 

but instead got this:

time                activity

12344               .

12345               1

12346               .

12347               .

12348               1

....

 

I thought the previously described MERGE would do it, but is there a simpler way?

Astounding
PROC Star

I had asked four questions.

 

You answered none of them.

 

But you changed the variables in your data sets so they no longer relate to your original post.

 

I can wait.

7maiasmith
Calcite | Level 5

Sorry for the confusion.  The original dataset had only datetime and minutes, from which I got start_time and end_time.  Original post updated.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 916 views
  • 4 likes
  • 3 in conversation