DATA Step, Macro, Functions and more

Referencing dataset "A" to calculate information from dataset "B"

Reply
N/A
Posts: 0

Referencing dataset "A" to calculate information from dataset "B"

Dataset "A" is composed of bird nesting information with a single nesting attempt coded per line with multiple variables. The important variables in this situation:
ID (unique sequential observation identifier);
AREA (study area - 1, 2, or 3);
YEAR (study year - 2005, 2006, or 2007);
START (start day of the nesting attempt - all dates are coded as 1 for January 1st continuing sequentially until the final day of that year);
FINISH (last day of nesting attempt).

Dataset "B" is composed of weather information with a single day per line of code with multiple measurement variables on each day. The important variables in this situation:
AREA (similar to above);
YEAR (study year - 2005, 2006, or 2007);
DAY (similar format to above);
H_TEMP (high temp for that day);
L_TEMP (low temp for that day);
M_TEMP (mean temp for that day);
PRECIP (precipitation in inches).

What I need to do is calculate a min, mean, and max H_TEMP, L_TEMP, and M_TEMP and a sum of all PRECIP over the nesting period for each nesting attempt in dataset "A" by AREA. Since the date codes used are not unique per year I'm assuming the code with have to be written on a by year basis, and ideally the output would be new variables added to Dataset "A" for each nesting attempt (but the created dataset "C" could always be merged with dataset "A" by ID if necessary, I'm assuming).

I've searched online for an example or code to help with this but have been unsuccessful in finding anything. Any code, help or keywords would be greatly appreciated.

Thanks,

Tim
Super User
Posts: 5,424

Re: Referencing dataset "A" to calculate information from dataset "B"

Posted in reply to deleted_user
Your wish does sound like it would be too hard to solved. But I don't really understand how you want your calculation and output data. Can you please provide some sample input and prefered output?

Regards,
Linus
Data never sleeps
N/A
Posts: 0

Re: Referencing dataset "A" to calculate information from dataset "B"

Dataset "A" is coded like this:

ID AREA YEAR START FINISH
541 3 2006 172 211
542 3 2006 . .
543 3 2006 145 182
544 3 2006 121 167
545 3 2006 153 187
546 3 2006 171 204
547 3 2006 132 173
548 3 2006 . .
549 3 2006 129 167
550 3 2006 192 225
551 3 2006 110 .
552 3 2006 150 187
553 3 2006 . .
554 3 2006 168 .
555 3 2006 193 227

Dataset "B" is coded like this:

AREA DAY YEAR H_TEMP L_TEMP M_TEMP PRECIP
2 104 2003 78 40 59 0
2 105 2003 84 60 72 0
2 106 2003 82 33 57.5 0
2 107 2003 38 32 35 0.06
2 108 2003 42 35 38.5 0.31
2 109 2003 71 40 55.5 0.01
2 110 2003 73 58 65.5 0.24
2 111 2003 71 46 58.5 0.06
2 112 2003 48 33 40.5 0.03
2 113 2003 57 28 42.5 0
2 114 2003 64 33 48.5 0
2 115 2003 65 39 52 0
2 116 2003 60 31 45.5 0
2 117 2003 74 36 55 0
2 118 2003 78 55 66.5 0
2 119 2003 78 37 57.5 0
2 120 2003 64 46 55 0.13
2 121 2003 52 45 48.5 1.11
2 122 2003 51 42 46.5 0.02
2 123 2003 56 32 44 0
2 124 2003 64 31 47.5 0
2 125 2003 64 49 56.5 0.58
2 126 2003 63 49 56 0
2 127 2003 62 48 55 0.08
2 128 2003 63 46 54.5 0.01

As far as output goes I'm envisioning a dataset "C"with the following variables:

ID - individual nesting attempt identifier from dataset "A"
YEAR - coincides with both dataset "A" and "B"
AREA - coincides with both dataset "A" and "B"
min_H_TEMP - calculated (see below for how)
mean_H_TEMP - calculated (see below for how)
max_H_TEMP - calculated (see below for how)
min_L_TEMP - calculated (see below for how)
mean_L_TEMP - calculated (see below for how)
max_L_TEMP - calculated (see below for how)
min_M_TEMP - calculated (see below for how)
mean_M_TEMP - calculated (see below for how)
max_M_TEMP - calculated (see below for how)
sum_PRECIP - calculated (see below for how)

Each sample in dataset "A" has a unique identifier (ID) and a start (START) and an end (FINISH) date associated with a specific year of the study (YEAR) and a study area (AREA).

Dataset "B" has corresponding YEAR, AREA, and DAY (related to START and FINISH) information for weather variables (H_TEMP, L_TEMP, M_TEMP, and PRECIP).

I need to figure out how to write SAS code to use dataset "A" to find START and FINISH dates for an individual nesting attempt identified by ID, YEAR, and AREA and then take those values and calculate based on dataset "B" the min, max and mean H_TEMP, L_TEMP, and M_TEMP as well as the sum of all precipitation (PRECIP) for the specified interval (START->FINISH) at the same AREA in the same YEAR as the nesting attempt (ID) from dataset "A".

Here's an example from above starting in dataset "A" with this information:
ID=546
START=171
FINISH=204
AREA=3
YEAR=2006

For this sample I would need the mean, min, and max H_TEMP, L_TEMP, and M_TEMP and sum of PRECIP calculated for the interval starting on 171 and ending on 204 from dataset "B" for 2006 at area 3.

Hopefully this additional information helps,

Tim
SAS Super FREQ
Posts: 8,861

Re: Referencing dataset "A" to calculate information from dataset "B"

Posted in reply to deleted_user
Hi:
Just curious...what's supposed to happen with the missing values from A, as, for example on ID 542 or 548 or 553?? Should those observations be ignored????

Also, you say you want the min, max AND mean for h_temp, l_temp and m_temp...so you'd get:

min_h_temp, max_h_temp, and mean_h_temp???? (High)
min_l_temp, max_l_temp and min_l_temp??? (Low)
AND
min_m_temp, max_m_temp and mean_m_temp??? (Mean)

I don't understand why you'd calculate the mean of the mean...do you really want to add up all the mean temps and divide by the count of days? I thought taking the mean of the mean was not a great idea? (I'm not a heavy stat person -- I just know that statisticians that I've worked for (in the past) would never have let me calculate the mean of the mean.)

As I said, just curious -- I think what you want to do is do-able -- probably with some Proc TRANSPOSE and Data step programming, but some of what you need/want seems confusing to me.

cynthia
N/A
Posts: 0

Re: Referencing dataset "A" to calculate information from dataset "B"

Posted in reply to Cynthia_sas
Cynthia--

Great questionsSmiley Happy

As far as missing values from "A" are concerned for now yes they can be ignored. If both values are missing a nest was not initiated for that observation, but if only FINISH is missing then it was an unsucessful nest and I need to go back to the log books and enter the date it failed into the code.

In general I would definitely agree with you that calculating the mean of the mean is a bad practice and if I had hourly weather data I could calculate the mean temp over the course of the interval but all I have is the mean temp for each day, which leaves me calculating the mean of a M_TEMP over the course of the nesting interval (granted not ideal but will hopefully provide an idea of weather conditions during that time when combined with the other measures).

I'll look into the TRANSPOSE function but I think I'm going to need help getting through the DATA STEP portion.

Thanks,

Tim
Respected Advisor
Posts: 4,173

Re: Referencing dataset "A" to calculate information from dataset "B"

Posted in reply to deleted_user
Hi Tim

Your sample data don't match too well (i.e. different years and area codes in the two datasets). I changed them a bit hoping that my assumptions were right.

Have a look at the following code. It might give you some hints even if it might not be exactly what you're looking for.

The code prepares mainly one big denormalised dataset (kind of a "datamart"). It shouldn't be too difficult to get what you want by using proc means on this dataset - and there you can also decide how to handle missings.

I converted the 'year' and 'day' variables to a variable 'date' containing a SAS date (count of days since 1 January 1960). This way your dates are "unique" for the same day in different years.

HTH
Patrick

data nesting;
infile datalines truncover;
input ID AREA YEAR $4. START FINISH;
datalines;
541 3 2006 172 211
542 3 2006 . .
543 3 2006 145 182
544 3 2006 121 167
545 3 2006 153 187
546 3 2006 171 204
547 3 2006 132 173
548 3 2006 . .
549 3 2006 129 167
550 3 2006 192 225
551 3 2006 110 .
552 3 2006 150 187
553 3 2006 . .
554 3 2006 168 .
555 3 2006 193 227
;
data weather;
infile datalines truncover;
input AREA DAY YEAR $4. H_TEMP L_TEMP M_TEMP PRECIP;
format date ddmmyy10.;
date=input(year||'0101',yymmdd8.)+DAY;
datalines;
3 104 2006 78 40 59 0
3 105 2006 84 60 72 0
3 106 2006 82 33 57.5 0
3 107 2006 38 32 35 0.06
3 108 2006 42 35 38.5 0.31
3 109 2006 71 40 55.5 0.01
3 110 2006 73 58 65.5 0.24
3 111 2006 71 46 58.5 0.06
3 112 2006 48 33 40.5 0.03
3 113 2006 57 28 42.5 0
3 114 2006 64 33 48.5 0
3 115 2006 65 39 52 0
3 116 2006 60 31 45.5 0
3 117 2006 74 36 55 0
3 118 2006 78 55 66.5 0
3 119 2006 78 37 57.5 0
3 120 2006 64 46 55 0.13
3 121 2006 52 45 48.5 1.11
3 122 2006 51 42 46.5 0.02
3 123 2006 56 32 44 0
3 124 2006 64 31 47.5 0
3 125 2006 64 49 56.5 0.58
3 126 2006 63 49 56 0
3 127 2006 62 48 55 0.08
3 128 2006 63 46 54.5 0.01
;

data nesting2;
set nesting;
format date ddmmyy10.;
if start ne . and finish ne . then
do;
do ThisDay=start to finish;
date=input(year||'0101',yymmdd8.)+ThisDay;
output;
end;
end;
else
output;
run;

proc sql;
create table NestingAndWeather as
select *
from nesting2 l left join weather r
on l.date=r.date and l.area=r.area
order by id,area,date
;
quit;

Message was edited by: Patrick
SAS Super FREQ
Posts: 8,861

Re: Referencing dataset "A" to calculate information from dataset "B"

Hi:
Since the year and day together make up a Julian date, I'd be tempted to do something like this instead of using arithmetic with a starting value of 0101:
[pre]
schar = catt(year,put(start,z3.));
sdate=input(schar,julian7.);

fchar = catt(year,put(finish,z3.));
fdate=input(fchar,julian7.);

wchar = catt(year,put(day,z3.));
wdate = input(wdate,julian7.);
[/pre]

You still have to check for missing, etc, etc. I show it as 2 separate statements just for ease of understanding how the date is being turned first into a character value with year and the day value concatenated together and then the INPUT function uses the JULIAN informat to translate the julian form of the date to the SAS internal number.

cynthia
N/A
Posts: 0

Re: Referencing dataset "A" to calculate information from dataset "B"

Posted in reply to Cynthia_sas
Patrick and Cynthia--

A big thank you goes out to both of you for your help. I'm working on the code to compute means using PROC MEANS (output by ID) which is working I just need to update my files and code it the rest of the way through.

If I hit another wall I'll be sure to post up again. Thanks for the help,

Tim
Ask a Question
Discussion stats
  • 7 replies
  • 187 views
  • 0 likes
  • 4 in conversation