BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ZacLopresti
Obsidian | Level 7

Hi all,

 

First question here.

I have two datasets, the first containing a sample result column and a datetime column (when the sample was taken). The other, Containing a bed number, start date and end date.

 

What I want to achieve is an averaged sample value for each bed.

i.e. average the sample results if they lay between the start and end dates of each bed.

 

Capture1.PNGCapture.PNG

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
PGStats
Opal | Level 21

For equal weighting, it is quite simple, as long as bed periods don't overlap:

 

proc sql;
create table want as
select 
    bed,
    start, end,
    mean(PercentGrind) as averageGrind
from 
    BED_DATETIME as a left join
    BATTERYFEEDSAMPLE as b on b.datetime between start and end
group by bed, start, end;
quit;
PG

View solution in original post

9 REPLIES 9
ballardw
Super User

Average sample of what?

 

What type of variables are your start, end and datetime? Numeric with datetime format applied or character? One cannot tell from a picture.

 

Please do not call values with a time component a "date" when discussing SAS variables. Dates and times use different units of measurement and when you say "date" that means days. And they don't compare well with time plus raising a question of are the "start" and "end" supposed to be considering the "date" portion or the actual datetime value the picture implies.

 

And since "bed" is involved per "What I want to achieve is an averaged sample value for each bed." then which "bed" is the "sample" from? Or are we supposed to guess which bed was measured?

 

Maybe something along these lines but completely untested as I can't write code to read pictures and you didn't provide names of data sets.

ZacLopresti
Obsidian | Level 7

Thanks for your reply.

 

Maybe a better way of explaining is as follows.

I have a bed (which is a mix of minerals), during the time that a specific bed is being used, multiple samples of the size fraction (PercentGrind) of the bed are taken. I want to average all the sample results taken within a specific bed.

 

I want the average of all the PercentGrind values if the corresponding DateTimes fall between the Startand End DateTimes of a specific Bed. 

 

To answer your questions:

- All datetime variables are numeric.

-Each sample has the DateTime at which it was taken recorded with it. If the DateTime is between the Start and End Datetimes it has been taken from that bed.

- I will try and upload the data not as oictures but had troubles in the past.

 

Thanks,

Zac

ZacLopresti
Obsidian | Level 7

data WORK.BATTERYFEEDSAMPLE;

infile datalines dsd truncover;

input DateTime:DATETIME20. PercentGrind:6.1;

format DateTime DATETIME20. PercentGrind 6.1;

label DateTime="SamplingStartTime" PercentGrind="Grind (%)";

datalines;

25JUN2016:13:00:00 89.9

26JUN2016:01:00:00 89.8

26JUN2016:13:00:00 87.0

27JUN2016:13:00:00 90.3

28JUN2016:13:00:00 90.6

29JUN2016:01:00:00 90.9

29JUN2016:01:01:00 .

29JUN2016:13:00:00 89.8

30JUN2016:01:00:00 90.2

30JUN2016:13:00:00 89.6

01JUL2016:01:00:00 90.6

01JUL2016:13:00:00 86.1

02JUL2016:01:00:00 86.9

02JUL2016:13:00:00 86.1

03JUL2016:01:00:00 86.1

03JUL2016:13:00:00 84.7

04JUL2016:01:00:00 88.0

04JUL2016:13:00:00 89.7

05JUL2016:01:00:00 89.4

05JUL2016:13:00:00 89.6

06JUL2016:01:00:00 86.8

06JUL2016:01:01:00 .

06JUL2016:13:00:00 87.8

07JUL2016:01:00:00 89.4

07JUL2016:13:00:00 89.8

08JUL2016:01:00:00 89.9

08JUL2016:13:00:00 89.8

09JUL2016:01:00:00 90.0

;;;;

 

data WORK.BED_DATETIME;

infile datalines dsd truncover;

input Bed:4. Start:DATETIME20. End:DATETIME20.;

format Bed 4. Start DATETIME20. End DATETIME20.;

label Bed="Bed";

datalines;

1668 26JUN2016:02:22:00 02JUL2016:03:29:59

1669 02JUL2016:03:30:00 08JUL2016:19:55:59

1670 08JUL2016:19:56:00 15JUL2016:20:18:59

1671 15JUL2016:20:19:00 22JUL2016:17:14:59

1672 22JUL2016:17:15:00 29JUL2016:14:26:59

1673 29JUL2016:14:27:00 05AUG2016:11:11:59

1674 05AUG2016:11:12:00 12AUG2016:13:41:59

;;;;

mkeintz
PROC Star

I presume you want the weighted average PERCENTGRIND for each bed (from dataset BEDS) where weights are determined by duration of the bed used within each period identified in your PCT_GRIND dataset (that's the name I assign it).  I also presume that the PCT_GRIND date/time stamps are the start of a given level of PERCENTGRIND.

 

Luckily (1) your bed usages do not overlap  (each END precedes the next START in the dataset (which I call BEDS),  and (2) both datasets are sorted chronologically (BEDS is sorted by START, and PCT_GRIND is sorted by DATETIME).  These features are assumed to this program.

 

Getting a weighted average is straightforward using PROC SUMMARY (also know as PROC MEANS).  The problem is generating a dataset with the weights (variable DURATION in data set NEED):

 

data need;
  set beds ;

  /* Exhaust PCT_GRIND periods that cover this START-END period */
  do until (datetime>end);
    if (datetime<=start<=nxt_dt)  or (datetime<=end<=nxt_dt) then do;
      duration =  1 + min(end,nxt_dt) - max(start,datetime);
      output;
    end;
    if nxt_dt>end then leave;  ** Read no more PCT_GRIND for this bed**;
    merge pct_grind
          pct_grind (firstobs=2 keep=datetime rename=(datetime=nxt_dt));
    format duration time10.0;
  end;
run;

proc summary data=need noprint;
  class bed;
  var percentgrind;  
  weight duration;   
  ways 1;
  output out=want mean=pctgrind_mean sumwgt=total_duration / autoname;
  format total_duration time10.0;
run;

For each bed, dataset NEED gets as many records as there are overlaps between the bed usage (START to END) with the time spans in PCT_GRIND (DATETIME to NXT_DT).  Each such record gets a variable called DURATION. That's where the 

duration =  1 + min(end,nxt_dt) - max(start,datetime);

comes in.  Note the NXT_DT is the datetime starting the next PCT_GRIND span.  It is obtained by use of the "FIRSTOBS=2" parameter in the self-merge activated by the MERGE statement.

 

Since PERCENTGRIND is missing for some time spans, any bed use that includes that time span will include only the neighboring non-missing spans that overlaps with bed usage.

 

Also any bed usage that is outside of the range of times in PCT_GRIND will not be processed at all.

 

I recommend you examine dataset NEED to see the intent of the data step programming.

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ZacLopresti
Obsidian | Level 7

Thanks for your reply.

I want all samples to be equally weight though (regardless of time).

I'll have a play with what you've sent and see if I can modify it.

mkeintz
PROC Star

@ZacLopresti wrote:

Thanks for your reply.

I want all samples to be equally weight though (regardless of time).

Then drop the WEIGHT DURATION statement in the proc summary.   And if you are never going to examine the exposure time to each PERCENTGRIND, you can also drop the calculation of DURATION in the DATA NEED step.

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ZacLopresti
Obsidian | Level 7
Worked well.
Thankyou.
PGStats
Opal | Level 21

For equal weighting, it is quite simple, as long as bed periods don't overlap:

 

proc sql;
create table want as
select 
    bed,
    start, end,
    mean(PercentGrind) as averageGrind
from 
    BED_DATETIME as a left join
    BATTERYFEEDSAMPLE as b on b.datetime between start and end
group by bed, start, end;
quit;
PG
ZacLopresti
Obsidian | Level 7
Fantastic. Thanks

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 1942 views
  • 3 likes
  • 4 in conversation