BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
kelagory
Calcite | Level 5

I have a large data set of bird sound recordings and want to calculate the mean number of times a bird species was detected each day. All of the bird species in the data set are not detected each day. The mean values are inflated because they are only including non-zero values. How do I assign a value of 0 to days when the species is not detected? Here is a sample of my data:

 

kelagory_0-1706741302236.png

 

From this data set, I output another using a proc means statement that provides the number of observations for each species:

   proc means noprint; by species trt year month day; var prob;
   output out=species_trt_day n=N_species_trt_day;

 

A simple command of:

   if N_species_trt_day=. then N_species_trt_day=0;" doesn't work because there are no observations for some species on any given day.

 

Short of creating a matrix of zeroes for all bird *date combinations and merging that data set with the original, what can I do to accomplish this?

 

Thanks for any help you can provide!

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

See if below gets you a step further. The code will add rows for days where you didn't observe a specific species (with a count of zero). I have also combined your day, month, year variables into a single variable detection_dt with a SAS date value. There are formats and functions should you ever need the day, month or year only from such a SAS date value.

 

data have;
  infile datalines truncover dlm=' ';
  input species:$20. trt:$20. year month day N_detections;
  detection_dt=mdy(month,day,year);
  format detection_dt date9.;
  drop year month day;
  datalines;
barswa corn 2020 6 9 2
barswa switchgrass 2020 6 9 1
bnhcow corn 2020 6 9 56
bnhcow switchgrass 2020 6 9 9
comyel corn 2020 6 9 1
easpho corn 2020 6 9 2
easpho switchgrass 2020 6 9 2
fiespa corn 2020 6 9 3
fiespa switchgrass 2020 6 9 4
graspa corn 2020 6 9 1
graspa switchgrass 2020 6 9 27
henspa corn 2020 6 9 2
indbun corn 2020 6 9 15
bnhcow corn 2020 6 10 168
bnhcow switchgrass 2020 6 10 15
chiswi switchgrass 2020 6 10 1
easpho corn 2020 6 10 2
easpho switchgrass 2020 6 10 2
fiespa corn 2020 6 10 17
fiespa switchgrass 2020 6 10 15
henspa switchgrass 2020 6 10 1
indbun corn 2020 6 10 75
indbun switchgrass 2020 6 10 9
vesspa corn 2020 6 10 4
vesspa switchgrass 2020 6 10 1
bnhcow corn 2020 6 11 80
bnhcow switchgrass 2020 6 11 30
easpho switchgrass 2020 6 11 1
fiespa corn 2020 6 11 73
fiespa switchgrass 2020 6 11 102
graspa switchgrass 2020 6 11 2
indbun corn 2020 6 11 94
indbun switchgrass 2020 6 11 31
vesspa corn 2020 6 11 7
vesspa switchgrass 2020 6 11 6
;

proc sql;
  create table want as
  select 
    l.species,
    l.trt,
    l.detection_dt,
    coalesce(r.N_detections,0) as N_detections
  from 
    (
      select a.species, a.trt, b.detection_dt
      from
        (
          select distinct species, trt
          from have
        ) a
        ,
        (
          select distinct detection_dt
          from have
        ) b
    ) l

    left join
    have r
    on l.species=r.species and l.trt=r.trt and l.detection_dt=r.detection_dt

    order by species, trt, detection_dt
  ;
quit;

proc means noprint data=want;
  where upcase(species) in ('BARSWA','BNHCOW','CHISWI','CLISWA','COMYEL','DICKCI','EASKIN',
                            'EASPHO','FIESPA','GRASPA','HENSPA','INDBUN','PURMAR','SEDWRE','VESSPA');
  by species trt detection_dt;
  var N_detections;
  /*var prob;*/
  output out=species_trt_day n=N_detections;
run;

 

 

View solution in original post

11 REPLIES 11
ballardw
Super User

What, in some detail, is that desired mean supposed to represent?

 

I do not know your data obviously. But I have concerns that you might be forcing "dates" to have invalid counts. Consider birds that migrate to / from the area you are collecting data. You might be forcing dates to be included when all the birds of one or more species have migrated out. Or they just move around a lot in a wide local area. Or there just aren't that many of a given species in the area. Plus you show something labeled Hour which I have to assume has something to do with time of day. Some birds are active at different times and your forced zeroes might include the times that species isn't normally active ( nocturnal/ diurnal basic difference)

 

That really isn't much of  sample as you talk about "detected each day" and show code with a variable N_species_trt_day but that does not appear in the picture of your data. What you show also does not seem to include "detection" but possibly some calculated rate.

kelagory
Calcite | Level 5

The mean is intended to represent the mean number of detections for each species over a 4-month period during the breeding season. Only species observed throughout that period are included. Admittedly, the data I included in my original post was confusing. See response to Patrick and additional description there.

 

Thanks for your help.

PaigeMiller
Diamond | Level 26

Short of creating a matrix of zeroes for all bird *date combinations and merging that data set with the original, what can I do to accomplish this?

 

Why are you ruling this out? It seems like a reasonable and relatively simple thing to do.

--
Paige Miller
kelagory
Calcite | Level 5

I am not familiar with a command-based way to do this, but will look into this further. I have 15 species over 3 years, each with 4 months and the days associated  with them. In the past, I've done this manually with excel.

PaigeMiller
Diamond | Level 26

@kelagory wrote:

I am not familiar with a command-based way to do this, but will look into this further. I have 15 species over 3 years, each with 4 months and the days associated  with them. In the past, I've done this manually with excel.


Cartesian join in PROC SQL. https://support.sas.com/kb/25/270.html

--
Paige Miller
Patrick
Opal | Level 21

At separate table with all expected birds and all expected dates used in a SQL join condition could do. ...and of course these two tables could get created dynamically based on your main table. 

If you want a code sample then please post your sample data not as screenshot but in a consumable form as text - ideally as part of a data /datalines script that creates the SAS table already.

kelagory
Calcite | Level 5

Patrick:

 

Thanks for the response. The screenshot I included previously was confusing. That screenshot was for a data set that includes all individual detections made throughout each day. The attached file excerpt, which includes the number of detections for each species each day, is part of a much larger and comprehensive file. The number of detections for each species was calculated using this script:

 

data select_species; set prob75;
if sensor='Swift2' or sensor='Swift4' or sensor='Swift6' then trt='switchgrass'; else trt='corn';
if species='barswa' or species='bnhcow' or species='chiswi' or species='cliswa' or species='comyel' or species='dickci' or species='easkin' or species='easpho'
or species='fiespa' or species='graspa' or species='henspa' or species='indbun' or species='purmar' or species='sedwre' or species='vesspa';
proc sort; by species trt year month day;
proc means noprint; by species trt year month day; var prob;
output out=species_trt_day n=N_detections;

 

As you can see from the script above, I've selected 15 species for this analysis, but all species are not detected each day. 

 

Let me know if you have questions or need more detail.

ballardw
Super User

Not addressing the whole issue but you do not need all of those ORs in the comparisons. If you want to search one variable for a list of values use the IN operator.

Example, instead of

if sensor='Swift2' or sensor='Swift4' or sensor='Swift6' then trt='switchgrass';

use

if sensor in ( 'Swift2' 'Swift4' 'Swift6'  ) then trt='switchgrass'

Also, strongly recommend using the RUN; statement to end procedures and data steps (or QUIT where needed).

While SAS will help you generally as your code gets more sophisticated or complicated, possibly involving macro coding, the missing Run occasionally will cause problems and especially with macro generated code may be hard to diagnose why a problem occurs.

kelagory
Calcite | Level 5

Thanks for the suggestions!

Patrick
Opal | Level 21

See if below gets you a step further. The code will add rows for days where you didn't observe a specific species (with a count of zero). I have also combined your day, month, year variables into a single variable detection_dt with a SAS date value. There are formats and functions should you ever need the day, month or year only from such a SAS date value.

 

data have;
  infile datalines truncover dlm=' ';
  input species:$20. trt:$20. year month day N_detections;
  detection_dt=mdy(month,day,year);
  format detection_dt date9.;
  drop year month day;
  datalines;
barswa corn 2020 6 9 2
barswa switchgrass 2020 6 9 1
bnhcow corn 2020 6 9 56
bnhcow switchgrass 2020 6 9 9
comyel corn 2020 6 9 1
easpho corn 2020 6 9 2
easpho switchgrass 2020 6 9 2
fiespa corn 2020 6 9 3
fiespa switchgrass 2020 6 9 4
graspa corn 2020 6 9 1
graspa switchgrass 2020 6 9 27
henspa corn 2020 6 9 2
indbun corn 2020 6 9 15
bnhcow corn 2020 6 10 168
bnhcow switchgrass 2020 6 10 15
chiswi switchgrass 2020 6 10 1
easpho corn 2020 6 10 2
easpho switchgrass 2020 6 10 2
fiespa corn 2020 6 10 17
fiespa switchgrass 2020 6 10 15
henspa switchgrass 2020 6 10 1
indbun corn 2020 6 10 75
indbun switchgrass 2020 6 10 9
vesspa corn 2020 6 10 4
vesspa switchgrass 2020 6 10 1
bnhcow corn 2020 6 11 80
bnhcow switchgrass 2020 6 11 30
easpho switchgrass 2020 6 11 1
fiespa corn 2020 6 11 73
fiespa switchgrass 2020 6 11 102
graspa switchgrass 2020 6 11 2
indbun corn 2020 6 11 94
indbun switchgrass 2020 6 11 31
vesspa corn 2020 6 11 7
vesspa switchgrass 2020 6 11 6
;

proc sql;
  create table want as
  select 
    l.species,
    l.trt,
    l.detection_dt,
    coalesce(r.N_detections,0) as N_detections
  from 
    (
      select a.species, a.trt, b.detection_dt
      from
        (
          select distinct species, trt
          from have
        ) a
        ,
        (
          select distinct detection_dt
          from have
        ) b
    ) l

    left join
    have r
    on l.species=r.species and l.trt=r.trt and l.detection_dt=r.detection_dt

    order by species, trt, detection_dt
  ;
quit;

proc means noprint data=want;
  where upcase(species) in ('BARSWA','BNHCOW','CHISWI','CLISWA','COMYEL','DICKCI','EASKIN',
                            'EASPHO','FIESPA','GRASPA','HENSPA','INDBUN','PURMAR','SEDWRE','VESSPA');
  by species trt detection_dt;
  var N_detections;
  /*var prob;*/
  output out=species_trt_day n=N_detections;
run;

 

 

kelagory
Calcite | Level 5

Patrick:

 

That worked beautifully! I made some minor modifications and verified with the last print statement that the data included all of the missing zeroes.


data select_species; set prob75;
if sensor in ('Swift2' 'Swift4' 'Swift6') then trt='switchgrass'; else trt='corn';
if species in ('barswa' 'bnhcow' 'chiswi' 'cliswa' 'comyel' 'dickci' 'easkin' 'easpho' 'fiespa' 'graspa' 'henspa' 'indbun' 'purmar' 'sedwre' 'vesspa');
proc sort; by species trt date;
proc means noprint; by species trt date; var prob;
output out=species_trt_date n=N_detect;

run;

data species_trt_date; set species_trt_date;

proc sql;
create table all_species as
select
l.species,
l.trt,
l.date,
coalesce(r.N_detect,0) as N_detect
from
(
select a.species, a.trt, b.date
from
(
select distinct species, trt
from species_trt_date
) a
,
(
select distinct date
from species_trt_date
) b
) l

left join
species_trt_date r
on l.species=r.species and l.trt=r.trt and l.date=r.date

order by species, trt, date
;
quit;

data missing; set all_species;
N_detect=0;

data add_missing; set species_trt_date missing;
proc sort; by species trt date;
proc means noprint; by species trt date; var N_detect;
output out=add_missing sum=N_detect;

proc print data=add_missing;
run;

Thanks!

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 1980 views
  • 0 likes
  • 4 in conversation