turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- Referencing dataset "A" to calculate information f...

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-16-2008 10:29 AM

Dataset "A" is composed of bird nesting information with a single nesting attempt coded per line with multiple variables. The important variables in this situation:

ID (unique sequential observation identifier);

AREA (study area - 1, 2, or 3);

YEAR (study year - 2005, 2006, or 2007);

START (start day of the nesting attempt - all dates are coded as 1 for January 1st continuing sequentially until the final day of that year);

FINISH (last day of nesting attempt).

Dataset "B" is composed of weather information with a single day per line of code with multiple measurement variables on each day. The important variables in this situation:

AREA (similar to above);

YEAR (study year - 2005, 2006, or 2007);

DAY (similar format to above);

H_TEMP (high temp for that day);

L_TEMP (low temp for that day);

M_TEMP (mean temp for that day);

PRECIP (precipitation in inches).

What I need to do is calculate a min, mean, and max H_TEMP, L_TEMP, and M_TEMP and a sum of all PRECIP over the nesting period for each nesting attempt in dataset "A" by AREA. Since the date codes used are not unique per year I'm assuming the code with have to be written on a by year basis, and ideally the output would be new variables added to Dataset "A" for each nesting attempt (but the created dataset "C" could always be merged with dataset "A" by ID if necessary, I'm assuming).

I've searched online for an example or code to help with this but have been unsuccessful in finding anything. Any code, help or keywords would be greatly appreciated.

Thanks,

Tim

ID (unique sequential observation identifier);

AREA (study area - 1, 2, or 3);

YEAR (study year - 2005, 2006, or 2007);

START (start day of the nesting attempt - all dates are coded as 1 for January 1st continuing sequentially until the final day of that year);

FINISH (last day of nesting attempt).

Dataset "B" is composed of weather information with a single day per line of code with multiple measurement variables on each day. The important variables in this situation:

AREA (similar to above);

YEAR (study year - 2005, 2006, or 2007);

DAY (similar format to above);

H_TEMP (high temp for that day);

L_TEMP (low temp for that day);

M_TEMP (mean temp for that day);

PRECIP (precipitation in inches).

What I need to do is calculate a min, mean, and max H_TEMP, L_TEMP, and M_TEMP and a sum of all PRECIP over the nesting period for each nesting attempt in dataset "A" by AREA. Since the date codes used are not unique per year I'm assuming the code with have to be written on a by year basis, and ideally the output would be new variables added to Dataset "A" for each nesting attempt (but the created dataset "C" could always be merged with dataset "A" by ID if necessary, I'm assuming).

I've searched online for an example or code to help with this but have been unsuccessful in finding anything. Any code, help or keywords would be greatly appreciated.

Thanks,

Tim

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-16-2008 03:17 PM

Your wish does sound like it would be too hard to solved. But I don't really understand how you want your calculation and output data. Can you please provide some sample input and prefered output?

Regards,

Linus

Regards,

Linus

Data never sleeps

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-16-2008 04:49 PM

Dataset "A" is coded like this:

ID AREA YEAR START FINISH

541 3 2006 172 211

542 3 2006 . .

543 3 2006 145 182

544 3 2006 121 167

545 3 2006 153 187

546 3 2006 171 204

547 3 2006 132 173

548 3 2006 . .

549 3 2006 129 167

550 3 2006 192 225

551 3 2006 110 .

552 3 2006 150 187

553 3 2006 . .

554 3 2006 168 .

555 3 2006 193 227

Dataset "B" is coded like this:

AREA DAY YEAR H_TEMP L_TEMP M_TEMP PRECIP

2 104 2003 78 40 59 0

2 105 2003 84 60 72 0

2 106 2003 82 33 57.5 0

2 107 2003 38 32 35 0.06

2 108 2003 42 35 38.5 0.31

2 109 2003 71 40 55.5 0.01

2 110 2003 73 58 65.5 0.24

2 111 2003 71 46 58.5 0.06

2 112 2003 48 33 40.5 0.03

2 113 2003 57 28 42.5 0

2 114 2003 64 33 48.5 0

2 115 2003 65 39 52 0

2 116 2003 60 31 45.5 0

2 117 2003 74 36 55 0

2 118 2003 78 55 66.5 0

2 119 2003 78 37 57.5 0

2 120 2003 64 46 55 0.13

2 121 2003 52 45 48.5 1.11

2 122 2003 51 42 46.5 0.02

2 123 2003 56 32 44 0

2 124 2003 64 31 47.5 0

2 125 2003 64 49 56.5 0.58

2 126 2003 63 49 56 0

2 127 2003 62 48 55 0.08

2 128 2003 63 46 54.5 0.01

As far as output goes I'm envisioning a dataset "C"with the following variables:

ID - individual nesting attempt identifier from dataset "A"

YEAR - coincides with both dataset "A" and "B"

AREA - coincides with both dataset "A" and "B"

min_H_TEMP - calculated (see below for how)

mean_H_TEMP - calculated (see below for how)

max_H_TEMP - calculated (see below for how)

min_L_TEMP - calculated (see below for how)

mean_L_TEMP - calculated (see below for how)

max_L_TEMP - calculated (see below for how)

min_M_TEMP - calculated (see below for how)

mean_M_TEMP - calculated (see below for how)

max_M_TEMP - calculated (see below for how)

sum_PRECIP - calculated (see below for how)

Each sample in dataset "A" has a unique identifier (ID) and a start (START) and an end (FINISH) date associated with a specific year of the study (YEAR) and a study area (AREA).

Dataset "B" has corresponding YEAR, AREA, and DAY (related to START and FINISH) information for weather variables (H_TEMP, L_TEMP, M_TEMP, and PRECIP).

I need to figure out how to write SAS code to use dataset "A" to find START and FINISH dates for an individual nesting attempt identified by ID, YEAR, and AREA and then take those values and calculate based on dataset "B" the min, max and mean H_TEMP, L_TEMP, and M_TEMP as well as the sum of all precipitation (PRECIP) for the specified interval (START->FINISH) at the same AREA in the same YEAR as the nesting attempt (ID) from dataset "A".

Here's an example from above starting in dataset "A" with this information:

ID=546

START=171

FINISH=204

AREA=3

YEAR=2006

For this sample I would need the mean, min, and max H_TEMP, L_TEMP, and M_TEMP and sum of PRECIP calculated for the interval starting on 171 and ending on 204 from dataset "B" for 2006 at area 3.

Hopefully this additional information helps,

Tim

ID AREA YEAR START FINISH

541 3 2006 172 211

542 3 2006 . .

543 3 2006 145 182

544 3 2006 121 167

545 3 2006 153 187

546 3 2006 171 204

547 3 2006 132 173

548 3 2006 . .

549 3 2006 129 167

550 3 2006 192 225

551 3 2006 110 .

552 3 2006 150 187

553 3 2006 . .

554 3 2006 168 .

555 3 2006 193 227

Dataset "B" is coded like this:

AREA DAY YEAR H_TEMP L_TEMP M_TEMP PRECIP

2 104 2003 78 40 59 0

2 105 2003 84 60 72 0

2 106 2003 82 33 57.5 0

2 107 2003 38 32 35 0.06

2 108 2003 42 35 38.5 0.31

2 109 2003 71 40 55.5 0.01

2 110 2003 73 58 65.5 0.24

2 111 2003 71 46 58.5 0.06

2 112 2003 48 33 40.5 0.03

2 113 2003 57 28 42.5 0

2 114 2003 64 33 48.5 0

2 115 2003 65 39 52 0

2 116 2003 60 31 45.5 0

2 117 2003 74 36 55 0

2 118 2003 78 55 66.5 0

2 119 2003 78 37 57.5 0

2 120 2003 64 46 55 0.13

2 121 2003 52 45 48.5 1.11

2 122 2003 51 42 46.5 0.02

2 123 2003 56 32 44 0

2 124 2003 64 31 47.5 0

2 125 2003 64 49 56.5 0.58

2 126 2003 63 49 56 0

2 127 2003 62 48 55 0.08

2 128 2003 63 46 54.5 0.01

As far as output goes I'm envisioning a dataset "C"with the following variables:

ID - individual nesting attempt identifier from dataset "A"

YEAR - coincides with both dataset "A" and "B"

AREA - coincides with both dataset "A" and "B"

min_H_TEMP - calculated (see below for how)

mean_H_TEMP - calculated (see below for how)

max_H_TEMP - calculated (see below for how)

min_L_TEMP - calculated (see below for how)

mean_L_TEMP - calculated (see below for how)

max_L_TEMP - calculated (see below for how)

min_M_TEMP - calculated (see below for how)

mean_M_TEMP - calculated (see below for how)

max_M_TEMP - calculated (see below for how)

sum_PRECIP - calculated (see below for how)

Each sample in dataset "A" has a unique identifier (ID) and a start (START) and an end (FINISH) date associated with a specific year of the study (YEAR) and a study area (AREA).

Dataset "B" has corresponding YEAR, AREA, and DAY (related to START and FINISH) information for weather variables (H_TEMP, L_TEMP, M_TEMP, and PRECIP).

I need to figure out how to write SAS code to use dataset "A" to find START and FINISH dates for an individual nesting attempt identified by ID, YEAR, and AREA and then take those values and calculate based on dataset "B" the min, max and mean H_TEMP, L_TEMP, and M_TEMP as well as the sum of all precipitation (PRECIP) for the specified interval (START->FINISH) at the same AREA in the same YEAR as the nesting attempt (ID) from dataset "A".

Here's an example from above starting in dataset "A" with this information:

ID=546

START=171

FINISH=204

AREA=3

YEAR=2006

For this sample I would need the mean, min, and max H_TEMP, L_TEMP, and M_TEMP and sum of PRECIP calculated for the interval starting on 171 and ending on 204 from dataset "B" for 2006 at area 3.

Hopefully this additional information helps,

Tim

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-16-2008 05:34 PM

Hi:

Just curious...what's supposed to happen with the missing values from A, as, for example on ID 542 or 548 or 553?? Should those observations be ignored????

Also, you say you want the min, max AND mean for h_temp, l_temp and m_temp...so you'd get:

min_h_temp, max_h_temp, and mean_h_temp???? (High)

min_l_temp, max_l_temp and min_l_temp??? (Low)

AND

min_m_temp, max_m_temp and mean_m_temp??? (Mean)

I don't understand why you'd calculate the mean of the mean...do you really want to add up all the mean temps and divide by the count of days? I thought taking the mean of the mean was not a great idea? (I'm not a heavy stat person -- I just know that statisticians that I've worked for (in the past) would never have let me calculate the mean of the mean.)

As I said, just curious -- I think what you want to do is do-able -- probably with some Proc TRANSPOSE and Data step programming, but some of what you need/want seems confusing to me.

cynthia

Just curious...what's supposed to happen with the missing values from A, as, for example on ID 542 or 548 or 553?? Should those observations be ignored????

Also, you say you want the min, max AND mean for h_temp, l_temp and m_temp...so you'd get:

min_h_temp, max_h_temp, and mean_h_temp???? (High)

min_l_temp, max_l_temp and min_l_temp??? (Low)

AND

min_m_temp, max_m_temp and mean_m_temp??? (Mean)

I don't understand why you'd calculate the mean of the mean...do you really want to add up all the mean temps and divide by the count of days? I thought taking the mean of the mean was not a great idea? (I'm not a heavy stat person -- I just know that statisticians that I've worked for (in the past) would never have let me calculate the mean of the mean.)

As I said, just curious -- I think what you want to do is do-able -- probably with some Proc TRANSPOSE and Data step programming, but some of what you need/want seems confusing to me.

cynthia

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-17-2008 07:21 AM

Cynthia--

Great questions

As far as missing values from "A" are concerned for now yes they can be ignored. If both values are missing a nest was not initiated for that observation, but if only FINISH is missing then it was an unsucessful nest and I need to go back to the log books and enter the date it failed into the code.

In general I would definitely agree with you that calculating the mean of the mean is a bad practice and if I had hourly weather data I could calculate the mean temp over the course of the interval but all I have is the mean temp for each day, which leaves me calculating the mean of a M_TEMP over the course of the nesting interval (granted not ideal but will hopefully provide an idea of weather conditions during that time when combined with the other measures).

I'll look into the TRANSPOSE function but I think I'm going to need help getting through the DATA STEP portion.

Thanks,

Tim

Great questions

As far as missing values from "A" are concerned for now yes they can be ignored. If both values are missing a nest was not initiated for that observation, but if only FINISH is missing then it was an unsucessful nest and I need to go back to the log books and enter the date it failed into the code.

In general I would definitely agree with you that calculating the mean of the mean is a bad practice and if I had hourly weather data I could calculate the mean temp over the course of the interval but all I have is the mean temp for each day, which leaves me calculating the mean of a M_TEMP over the course of the nesting interval (granted not ideal but will hopefully provide an idea of weather conditions during that time when combined with the other measures).

I'll look into the TRANSPOSE function but I think I'm going to need help getting through the DATA STEP portion.

Thanks,

Tim

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-17-2008 09:08 AM

Hi Tim

Your sample data don't match too well (i.e. different years and area codes in the two datasets). I changed them a bit hoping that my assumptions were right.

Have a look at the following code. It might give you some hints even if it might not be exactly what you're looking for.

The code prepares mainly one big denormalised dataset (kind of a "datamart"). It shouldn't be too difficult to get what you want by using proc means on this dataset - and there you can also decide how to handle missings.

I converted the 'year' and 'day' variables to a variable 'date' containing a SAS date (count of days since 1 January 1960). This way your dates are "unique" for the same day in different years.

HTH

Patrick

data nesting;

infile datalines truncover;

input ID AREA YEAR $4. START FINISH;

datalines;

541 3 2006 172 211

542 3 2006 . .

543 3 2006 145 182

544 3 2006 121 167

545 3 2006 153 187

546 3 2006 171 204

547 3 2006 132 173

548 3 2006 . .

549 3 2006 129 167

550 3 2006 192 225

551 3 2006 110 .

552 3 2006 150 187

553 3 2006 . .

554 3 2006 168 .

555 3 2006 193 227

;

data weather;

infile datalines truncover;

input AREA DAY YEAR $4. H_TEMP L_TEMP M_TEMP PRECIP;

format date ddmmyy10.;

date=input(year||'0101',yymmdd8.)+DAY;

datalines;

3 104 2006 78 40 59 0

3 105 2006 84 60 72 0

3 106 2006 82 33 57.5 0

3 107 2006 38 32 35 0.06

3 108 2006 42 35 38.5 0.31

3 109 2006 71 40 55.5 0.01

3 110 2006 73 58 65.5 0.24

3 111 2006 71 46 58.5 0.06

3 112 2006 48 33 40.5 0.03

3 113 2006 57 28 42.5 0

3 114 2006 64 33 48.5 0

3 115 2006 65 39 52 0

3 116 2006 60 31 45.5 0

3 117 2006 74 36 55 0

3 118 2006 78 55 66.5 0

3 119 2006 78 37 57.5 0

3 120 2006 64 46 55 0.13

3 121 2006 52 45 48.5 1.11

3 122 2006 51 42 46.5 0.02

3 123 2006 56 32 44 0

3 124 2006 64 31 47.5 0

3 125 2006 64 49 56.5 0.58

3 126 2006 63 49 56 0

3 127 2006 62 48 55 0.08

3 128 2006 63 46 54.5 0.01

;

data nesting2;

set nesting;

format date ddmmyy10.;

if start ne . and finish ne . then

do;

do ThisDay=start to finish;

date=input(year||'0101',yymmdd8.)+ThisDay;

output;

end;

end;

else

output;

run;

proc sql;

create table NestingAndWeather as

select *

from nesting2 l left join weather r

on l.date=r.date and l.area=r.area

order by id,area,date

;

quit;

Message was edited by: Patrick

Your sample data don't match too well (i.e. different years and area codes in the two datasets). I changed them a bit hoping that my assumptions were right.

Have a look at the following code. It might give you some hints even if it might not be exactly what you're looking for.

The code prepares mainly one big denormalised dataset (kind of a "datamart"). It shouldn't be too difficult to get what you want by using proc means on this dataset - and there you can also decide how to handle missings.

I converted the 'year' and 'day' variables to a variable 'date' containing a SAS date (count of days since 1 January 1960). This way your dates are "unique" for the same day in different years.

HTH

Patrick

data nesting;

infile datalines truncover;

input ID AREA YEAR $4. START FINISH;

datalines;

541 3 2006 172 211

542 3 2006 . .

543 3 2006 145 182

544 3 2006 121 167

545 3 2006 153 187

546 3 2006 171 204

547 3 2006 132 173

548 3 2006 . .

549 3 2006 129 167

550 3 2006 192 225

551 3 2006 110 .

552 3 2006 150 187

553 3 2006 . .

554 3 2006 168 .

555 3 2006 193 227

;

data weather;

infile datalines truncover;

input AREA DAY YEAR $4. H_TEMP L_TEMP M_TEMP PRECIP;

format date ddmmyy10.;

date=input(year||'0101',yymmdd8.)+DAY;

datalines;

3 104 2006 78 40 59 0

3 105 2006 84 60 72 0

3 106 2006 82 33 57.5 0

3 107 2006 38 32 35 0.06

3 108 2006 42 35 38.5 0.31

3 109 2006 71 40 55.5 0.01

3 110 2006 73 58 65.5 0.24

3 111 2006 71 46 58.5 0.06

3 112 2006 48 33 40.5 0.03

3 113 2006 57 28 42.5 0

3 114 2006 64 33 48.5 0

3 115 2006 65 39 52 0

3 116 2006 60 31 45.5 0

3 117 2006 74 36 55 0

3 118 2006 78 55 66.5 0

3 119 2006 78 37 57.5 0

3 120 2006 64 46 55 0.13

3 121 2006 52 45 48.5 1.11

3 122 2006 51 42 46.5 0.02

3 123 2006 56 32 44 0

3 124 2006 64 31 47.5 0

3 125 2006 64 49 56.5 0.58

3 126 2006 63 49 56 0

3 127 2006 62 48 55 0.08

3 128 2006 63 46 54.5 0.01

;

data nesting2;

set nesting;

format date ddmmyy10.;

if start ne . and finish ne . then

do;

do ThisDay=start to finish;

date=input(year||'0101',yymmdd8.)+ThisDay;

output;

end;

end;

else

output;

run;

proc sql;

create table NestingAndWeather as

select *

from nesting2 l left join weather r

on l.date=r.date and l.area=r.area

order by id,area,date

;

quit;

Message was edited by: Patrick

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-17-2008 11:20 AM

Hi:

Since the year and day together make up a Julian date, I'd be tempted to do something like this instead of using arithmetic with a starting value of 0101:

[pre]

schar = catt(year,put(start,z3.));

sdate=input(schar,julian7.);

fchar = catt(year,put(finish,z3.));

fdate=input(fchar,julian7.);

wchar = catt(year,put(day,z3.));

wdate = input(wdate,julian7.);

[/pre]

You still have to check for missing, etc, etc. I show it as 2 separate statements just for ease of understanding how the date is being turned first into a character value with year and the day value concatenated together and then the INPUT function uses the JULIAN informat to translate the julian form of the date to the SAS internal number.

cynthia

Since the year and day together make up a Julian date, I'd be tempted to do something like this instead of using arithmetic with a starting value of 0101:

[pre]

schar = catt(year,put(start,z3.));

sdate=input(schar,julian7.);

fchar = catt(year,put(finish,z3.));

fdate=input(fchar,julian7.);

wchar = catt(year,put(day,z3.));

wdate = input(wdate,julian7.);

[/pre]

You still have to check for missing, etc, etc. I show it as 2 separate statements just for ease of understanding how the date is being turned first into a character value with year and the day value concatenated together and then the INPUT function uses the JULIAN informat to translate the julian form of the date to the SAS internal number.

cynthia

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

10-17-2008 04:35 PM

Patrick and Cynthia--

A big thank you goes out to both of you for your help. I'm working on the code to compute means using PROC MEANS (output by ID) which is working I just need to update my files and code it the rest of the way through.

If I hit another wall I'll be sure to post up again. Thanks for the help,

Tim

A big thank you goes out to both of you for your help. I'm working on the code to compute means using PROC MEANS (output by ID) which is working I just need to update my files and code it the rest of the way through.

If I hit another wall I'll be sure to post up again. Thanks for the help,

Tim