Hello,
I have a dataset with the following three columns: site, dateTime, precipitationType. To keep it simple, let's assume I have two sites: KBOS and KACY. Datetime is a variable for the date and time of a particular observation. Precipitation type (again, keeping it simple) includes -RA, RA, +RA (for light, moderate, and heavy rain) and -SN, SN, +SN (for snow).
I'd like a report that contains columns for site, date, minutes of precipitation, minutes of RA, minutes of SN. I'd also like to, say, filter for dates with at least X minutes of precipitation and maybe include columns for -RA, RA, and +RA.
Any/all suggestions are welcome.
Thanks, Bruce
data have;
infile cards dsd;
informat site $4. datetime datetime. precipitationtype $3.;
format site $4. datetime datetime. precipitationtype $3.;
input site datetime precipitationtype;
cards;
kbos,01JAN2016:05:00:00,-RA
kbos,01JAN2016:05:01:00,-RA
kbos,01JAN2016:05:10:00,+RA
kacy,03JAN2016:07:20:00,-SN
kacy,03JAN2016:07:51:00,SN
kacy,03JAN2016:07:52:00,+SN
kbos,03JAN2016:16:21:00,-RA
kbos,03JAN2016:16:22:00,-RA
kbos,03JAN2016:16:26:00,+RA
kacy,04JAN2016:05:20:00,-SN
kacy,04JAN2016:07:51:00,SN
kacy,04JAN2016:07:52:00,+SN
;
run;
data temp;
set have;
date=datepart(datetime);
type=compress(precipitationtype,,'ka');
format date date9.;
run;
proc sql;
create table want as
select site,date,sum(type='RA') as RA,sum(type='SN') as SN,count(*) as total
from temp
group by site,date;
quit;
Is there a column to show participation start / stop? With these columns how do you see how many minutes of participation? Provide something like this to show the dataset you currently have:
data have;
infile cards dsd;
informat site $4. datetime datetime. precipitationtype $3.;
format site $4. datetime datetime. precipitationtype $3.;
input site datetime precipitationtype;
cards;
kbos,01JAN2016:05:00:00,-RA
kbos,02JAN2016:05:00:00,RA
kbos,03JAN2016:05:00:00,+RA
kacy,01JAN2016:05:00:00,-SN
kacy,02JAN2016:05:00:00,SN
kacy,03JAN2016:05:00:00,+SN
;
run;
@Steelers_In_DC wrote:
Is there a column to show participation start / stop? With these columns how do you see how many minutes of participation? Provide something like this to show the dataset you currently have:
data have;
infile cards dsd;
informat site $4. datetime datetime. precipitationtype $3.;
format site $4. datetime datetime. precipitationtype $3.;
input site datetime precipitationtype;
cards;
kbos,01JAN2016:05:00:00,-RA
kbos,02JAN2016:05:01:00,-RA
kbos,03JAN2016:05:10:00,+RA
kacy,01JAN2016:05:20:00,-SN
kacy,02JAN2016:07:51:00,SN
kacy,03JAN2016:07:52:00,+SN
;
run;
-- this means there was -RA at 0500 and 0501, +RA at 0510, -SN at 0520, SN at 0751 and +SN at 0752 -- otherwise, no precipitation.
Hi,
Please see updates to the data you included. Sorry for not being clearer in my initial submission. Each site can have only one type of weather at a given time, as I've shown. Also, I do not have start/end times. When there is precipitation (recorded once per minute), there is an observation. When there is no precipitation, there is no observation.
thanks, Bruce
You have enough going on that you should provide some example data and what the final result for that data would be.
Also, are your data point datetimes the begining, end or middle of a period? Are the reported times fixed interval (every 5 minutes) or variable interval?
If they are variable how are going to caluclate "minutes of precipitation"? I would expect a calcultion like that to come from a start/end time. Are you working from someone else's summarized data?
@ballardw wrote:
You have enough going on that you should provide some example data and what the final result for that data would be.
Also, are your data point datetimes the begining, end or middle of a period? Are the reported times fixed interval (every 5 minutes) or variable interval?
If they are variable how are going to caluclate "minutes of precipitation"? I would expect a calcultion like that to come from a start/end time. Are you working from someone else's summarized data?
Hello. In the previous reply I updated a sample dataset (yes, sorry I did not provide) and answered some of your questions. Reported times are every minute -- but only when there is precipitation. I'm working from my own data. The original data does have start/end times, but I thought it would be cleaner just to note what was happening every minute. Also, I will typically have a numerous entries in a row (0501, 0502, 0503, ...) for, say, -RA, but not always. Here's a fuller sample dataset (again, building on the sample provided earlier)
data have;
infile cards dsd;
informat site $4. datetime datetime. precipitationtype $3.;
format site $4. datetime datetime. precipitationtype $3.;
input site datetime precipitationtype;
cards;
kbos,01JAN2016:05:00:00,-RA
kbos,01JAN2016:05:01:00,-RA
kbos,01JAN2016:05:10:00,+RA
kacy,03JAN2016:07:20:00,-SN
kacy,03JAN2016:07:51:00,SN
kacy,03JAN2016:07:52:00,+SN
kbos,03JAN2016:16:21:00,-RA
kbos,03JAN2016:16:22:00,-RA
kbos,03JAN2016:16:26:00,+RA
kacy,04JAN2016:05:20:00,-SN
kacy,04JAN2016:07:51:00,SN
kacy,04JAN2016:07:52:00,+SN
;
run;
NOTE: I did not see that the earlier reply (from Steelers_In_DC) had different dates, so my short description of what is happening at 0500, 0501, etc. is correct, but the observations are occuring on different dates. I changed some of these and added more dates. This is more typical: a few observations (or many, actually, up to 1440) on a given day.
Thanks, Bruce
My output would look like this:
Site Date RA SN Total
KBOS 01Jan2016 3 0 3
KBOS 03Jan2016 3 0 3
KACY 03Jan2016 0 3 3
KACY 04Jan2016 0 3 3
Thanks, Bruce
You could start with something like this:
data precipMinutes;
length broadPrecipitationType $2;
set have; by site;
lastTime = lag(dateTime);
if first.site then do;
call missing(lastTime);
end;
else do;
minutes = intck("minute", lastTime, dateTime, "continuous");
broadPrecipitationType = compress(precipitationType, "+-");
date = datepart(dateTime);
if not missing(precipitationType) then output;
end;
format date yymmdd10.;
keep site dateTime date precipitationType broadPrecipitationType minutes;
run;
proc tabulate data=precipMinutes format=4.0;
class site date broadPrecipitationType;
var minutes;
table site*date, (broadPrecipitationType all)*minutes*sum="";
run;
Hello,
Given the fuller description of my data:
data have; infile cards dsd; informat site $4. datetime datetime. precipitationtype $3.; format site $4. datetime datetime. precipitationtype $3.; input site datetime precipitationtype; cards; kbos,01JAN2016:05:00:00,-RA kbos,01JAN2016:05:01:00,-RA kbos,01JAN2016:05:10:00,+RA kacy,03JAN2016:07:20:00,-SN kacy,03JAN2016:07:51:00,SN kacy,03JAN2016:07:52:00,+SN kbos,03JAN2016:16:21:00,-RA kbos,03JAN2016:16:22:00,-RA kbos,03JAN2016:16:26:00,+RA kacy,04JAN2016:05:20:00,-SN kacy,04JAN2016:07:51:00,SN kacy,04JAN2016:07:52:00,+SN ; run;
does your solution still apply? Also, I want to report precipitation for each day -- the idea being we will see which days had a lot of precipitation and what types fell. Finally, in your code, what does this do:
if first. site then do; call missing(lastTime); end;
Thanks! Bruce
data have;
infile cards dsd;
informat site $4. datetime datetime. precipitationtype $3.;
format site $4. datetime datetime. precipitationtype $3.;
input site datetime precipitationtype;
cards;
kbos,01JAN2016:05:00:00,-RA
kbos,01JAN2016:05:01:00,-RA
kbos,01JAN2016:05:10:00,+RA
kacy,03JAN2016:07:20:00,-SN
kacy,03JAN2016:07:51:00,SN
kacy,03JAN2016:07:52:00,+SN
kbos,03JAN2016:16:21:00,-RA
kbos,03JAN2016:16:22:00,-RA
kbos,03JAN2016:16:26:00,+RA
kacy,04JAN2016:05:20:00,-SN
kacy,04JAN2016:07:51:00,SN
kacy,04JAN2016:07:52:00,+SN
;
run;
data temp;
set have;
date=datepart(datetime);
type=compress(precipitationtype,,'ka');
format date date9.;
run;
proc sql;
create table want as
select site,date,sum(type='RA') as RA,sum(type='SN') as SN,count(*) as total
from temp
group by site,date;
quit;
@Ksharp wrote:
data have; infile cards dsd; informat site $4. datetime datetime. precipitationtype $3.; format site $4. datetime datetime. precipitationtype $3.; input site datetime precipitationtype; cards; kbos,01JAN2016:05:00:00,-RA kbos,01JAN2016:05:01:00,-RA kbos,01JAN2016:05:10:00,+RA kacy,03JAN2016:07:20:00,-SN kacy,03JAN2016:07:51:00,SN kacy,03JAN2016:07:52:00,+SN kbos,03JAN2016:16:21:00,-RA kbos,03JAN2016:16:22:00,-RA kbos,03JAN2016:16:26:00,+RA kacy,04JAN2016:05:20:00,-SN kacy,04JAN2016:07:51:00,SN kacy,04JAN2016:07:52:00,+SN ; run; data temp; set have; date=datepart(datetime); type=compress(precipitationtype,,'ka'); format date date9.; run; proc sql; create table want as select site,date,sum(type='RA') as RA,sum(type='SN') as SN,count(*) as total from temp group by site,date; quit;
Thanks very much! I was not aware that proc sql allowed conditions (type='SN') in the select statement. That's very handy.
Thanks again, Bruce
My code reflected the initial description of your data, or at least my understanding of it. It doesn't take advantage of the fact that every observation represents a single minute.
PG,
Yes, I apologize for not indicating in my initial post that each observation corresponded to 1 minute of precipitation. Thanks for your suggestion - and also for your suggestion on another post of mine, which I will use/get back to when I next receive data that needs processing and uploading, which will be soon.
Bruce
Sine you're looking for a "report" you may want to consider proc tabulate instead of proc sql.
Building on @Ksharp's code, you can use Proc Tabulate to generate your report.
proc tabulate data=temp;
class site date type;
table site*date, (type all='Total')*n=''*f=8./misstext='0';
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.