10-19-2016 12:36 PM
This is my first time posting here, and I'm fairly new to SAS... and I couldn't find another question about this.
Basically I have a data set with a list of ID numbers. Each ID number has timestamps listed after it. For example:
000134 1/1/2016 4:53 AM 2/3/2016 5:01 PM 5/12/2016 3:12 PM 6/1/2016 6:12 AM
000135 1/2/2016 11:04 AM 1/15/2016 5:51 AM
000136 1/1/2016 5:41 AM 2/17/2016 1:29 PM
Each timestamp signifies something about the ID number (i.e. "statuses") . What I want to do is plot the total number of IDs that fall in the same time interval, producing a time vs sum graph. For example, in the above sample data, for 1/1/2016, I would plot 2 totals. From 1/2/2106 to 1/14/2016, I would plot 3 totals. From 1/15/2016 to 2/3/2016 I would plot 2, and from 2/4/2016 to 2/17/2016 I would only plot 1 total.
Is this possible in SAS EG?
10-19-2016 04:18 PM
First thing is to confirm that your data is already a SAS data set.
Next would be to verify that the time stamps are actually SAS datetime values. Proc contents or table properties should tell you that the variables are numeric with a format that may be something like DATETIME20. You want datetime or date and time valued variables so that the concept of time interval makes some sense.
If you have a varying number of stamps what do the variable names look like?
The next step would be make the data look like
000134 1/1/2016 4:53 AM
000134 2/3/2016 5:01 PM
000134 5/12/2016 3:12 PM
Which will allow you to summarize counts on the date value of the time stamp.
Proc freq with the time stamp varible and a suitable format would get the counts and may generate a plot you like.
Or summarize with counts of the times stamp and then send to a plotting procedure. If you have lots of dates you might look for a NEEDLE plot.
10-19-2016 04:44 PM
Thank you ballardw.
Basically the x-axis of my theoretical graph is continuous in time or descrete in intervals of 24 hours. I thought about using Proc Freq to sum the same dates (reformat the data to just date and exclude time), but how would i account for days in between 2 timestamps?
For example, in the following:
How would I account for plotting the count on 1/15/2016, which would have 2 totals (1/1 to 2/3 for 000134 and 1/9 to 1/29 for 000135)? Hopefully I'm making sense here.
10-19-2016 07:59 PM
How would you like to account for them? The graphing procedures do not need a value for every date in an interval.
Here's a data simulation that generates a set, summarizes a count by date and graphs it.
data example; do id = 123 to 128; rancount = ceil(ranuni(123)*20); do j= 1 to rancount; date = '01JAN2016'd + ceil(ranuni(456)*70); output; end; end; drop rancount j; format date mmddyy10.; run; proc freq data=example noprint; tables date/ out=datecount; run; proc sgplot data=datecount; scatter x=date y=count; run;
This should generate some gaps in the dates. Most of the dates with data are likely to only have 1 or 2 counts but you should get a few 3s and 4s.
You would have to investigate axis control statements for different axis appearance.
10-19-2016 08:41 PM
This solution definitely has the right idea.
This is the data set (after sorting) that was produced from your code.
Let's say each two timestamp represents an interval we want to examine. So for example, for ID 123, (1/10, 1/14) is the first interval, (1/15, 1/17) is the second interval, and so on. According to this rule, on 1/11 there should be 5 counts of unique ID numbers (123, 124, 125, 126, and 127--NOT 128). I basically want this count reflected on the graph.
Here is the graph that was produced from the data set:
I don't know if I'm making this more complicated than it should be....
It's okay if there is no straightforward solution. Suggestions are still appreciated.