The SAS Output Delivery System and reporting techniques

Creating records to represent species not encountered at plots

Reply
N/A
Posts: 0

Creating records to represent species not encountered at plots

This summer we visited plots throughout PA and counted the # of trees at each subplot. Every time a tree was encountered, we created a record in the database. Now I'm trying to analyze this data, however, for every subplot I need to account for every species that was not encountered (ie. I need to create a record for each species and each subplot that represents a count of zero). So if I visit plot 1 subplot 1 and find tree species#A, B, and D, I enter records in the database as follows:

Record Plot Subplot TreeSpecies
1 1 1 A
2 1 1 B
3 1 1 D

If I continue to plot 3 subplot 2 and find C, and D, at plot 7 subplot 5 I find G, and at plot 7 subplot 6 I find A,B,C, and D, I'll then have:

Record Plot Subplot TreeSpecies
1 1 1 A
2 1 1 B
3 1 1 D
4 3 2 C
5 3 2 D
6 7 5 G
7 7 6 A
8 7 6 B
9 7 6 C
10 7 6 D

So, the species list is now: A,B,C,D,G. To analyze it, I?ll need to create records for each species that wasn?t encountered at a subplot. I?ll need to add the following records:

Record Plot Subplot TreeSpecies Encountered
11 1 1 C N
12 1 1 G N
13 3 2 A N
14 3 2 B N
15 3 2 G N
16 7 5 A N
17 7 5 B N
18 7 5 C N
19 7 5 D N
20 7 6 G N

The question is how? To make it a challenge, the list of species encountered and the plots visited varies from site to site, so I need to generate this list using code or a query, rather than typing a list.

Any ideas?

Richard S. Fritsky
Research Associate
Pennsylvania Cooperative Fish and Wildlife Research Unit
Penn State University

?...conservation is the joyful affirmation that there will be a future. That there will be generations to follow us...it is our responsibility to see that they will inherit a land that still contains its resources to use and its wildernesses to wander..." Ed Mankelow, B.C. Wildlife Federation
SAS Super FREQ
Posts: 8,744

Re: Creating records to represent species not encountered at plots

Richard:
Do you need the record number to be in the data for any particular reason???
cynthia
N/A
Posts: 0

Re: Creating records to represent species not encountered at plots

No, I don't need the record number in the data.
Super Contributor
Posts: 260

Re: Creating records to represent species not encountered at plots

Would that program be of any help ?
The trick here is to transpose data, in order to have the complete range of possible values for species. Then fill in the blanks for species not encountered, and transpose back your data.

Regards.
Olivier

DATA work.trees (DROP = record) ;
INPUT Record Plot Subplot TreeSpecies $ ;
encountered = "Y" ;
CARDS ;
1 1 1 A
2 1 1 B
3 1 1 D
4 3 2 C
5 3 2 D
6 7 5 G
7 7 6 A
8 7 6 B
9 7 6 C
10 7 6 D
;
RUN ;
PROC TRANSPOSE DATA = work.trees
OUT = work.species (DROP = _name_) ;
VAR encountered ;
ID TreeSpecies ;
BY plot subPlot ;
RUN ;
DATA work.species ;
SET work.species ;
ARRAY species $ _CHARACTER_ ;
DO OVER species ;
IF MISSING(species) THEN species = "N" ;
END ;
RUN ;
PROC TRANSPOSE DATA = work.species
OUT = work.Encountered_and_not
(RENAME = (_name_ = TreeSpecies
col1 = encountered)) ;
VAR _character_ ;
BY plot subPlot ;
RUN ;
N/A
Posts: 0

Re: Creating records to represent species not encountered at plots

Oliver, your program works great. However, for illustrative purposes I listed species using characters A, B, C..., yet in my actual data they are numeric. That seems to cause problems with your program because SAS adds an underscore before numeric variable names (ie. spp 10 becomes variable _10). How can I get around this?
N/A
Posts: 0

Re: Creating records to represent species not encountered at plots

This assumes the subplot is unique to the plot.

DATA t;
RETAIN Encountered 'Y'; /* add this to your input data */
Input Record Plot Subplot TreeSpecies $1.;
CARDS;
1 1 1 A
2 1 1 B
3 1 1 D
4 3 2 C
5 3 2 D
6 7 5 G
7 7 6 A
8 7 6 B
9 7 6 C
10 7 6 D
;;;;

PROC SQL;
CREATE TABLE t_all AS
SELECT plot, subplot, treespecies, 'N' AS encountered
FROM
(SELECT DISTINCT treespecies FROM t),
(SELECT DISTINCT plot, subplot FROM t)
;
CREATE TABLE t2 AS
SELECT * FROM t
UNION CORR
(SELECT * FROM t_all b WHERE b.treespecies NOT IN (SELECT treespecies FROM t t0 WHERE b.plot = t0.plot AND b.subplot =t0.subplot))
ORDER BY plot, subplot, treespecies
;
QUIT;
N/A
Posts: 0

Re: Creating records to represent species not encountered at plots

Every plot contains subplots numbered 1 through 10, so subplots are not unique to the plots.
N/A
Posts: 0

Re: Creating records to represent species not encountered at plots

Then simply replace
(SELECT DISTINCT plot, subplot FROM t)
with
(SELECT DISTINCT plot FROM t),
(SELECT DISTINCT subplot FROM t)

and I think you're out of it
N/A
Posts: 0

Re: Creating records to represent species not encountered at plots

RWRIGHT, Thank you very much for your reply. I tried your updated code, but it seems that I end up with a lot of extra records. I'm trying to figure out where these records come from and what they represent. Any ideas?
SAS Super FREQ
Posts: 8,744

Re: Creating records to represent species not encountered at plots

I took a totally different (non-SQL approach) that avoids the issue with transpose. I get a total of 20 rows by the time my program is done.

If your data are more complex than what you represented, then this approach might need some more tweaking after the proc report...

Basically, I'm using the COMPLETEROWS feature of PROC REPORT to figure out which treespecies are missing and then creating an output data set with PROC REPORT. The problem with the SQL solutions and with my original PROC REPORT solution, was that I was getting about 60 rows at first because of COMPLETEROWS trying to make ALL the combos of plot and subplot -- which I avoided by making something I call a "superkey" which is a character var that smushes plot and subplot together into one variable.
[pre]
** note treespecies are 111, 222, 333, 444 and 777;
** but they were A, B, C, D, G in original problem;
** 10 going in...only want 20 coming out;
** assumes that REAL data are structured AS shown below with 1 row for;
** each uniq plot/subplot/species combo;
DATA work.trees (DROP = record) ;
length uniq_p_sp $5;
INPUT Record INPlot INSubplot TreeSpecies $ ;
encountered = 'y';
** make a unique "superkey" that is the combo of plot and subplot in the original file;
uniq_p_sp = put(INplot,z2.)||'~'||put(INsubplot,z2.);
CARDS ;
1 1 1 111
2 1 1 222
3 1 1 444
4 3 2 333
5 3 2 444
6 7 5 777
7 7 6 111
8 7 6 222
9 7 6 333
10 7 6 444
;
RUN ;

** use proc report with completerows to do the work;
** make an output dataset from proc report;
** when n=0, the species was not in the original file;
proc report data=trees nowd completerows out=work.completeset;
title 'proc report';
column uniq_p_sp plot subplot treeSpecies n ;
define uniq_p_sp /group;
define treespecies /group;
define n /'count';
define plot /computed;
define subplot /computed;
compute before uniq_p_sp;
tempvar = uniq_p_sp;
temppl = input(scan(uniq_p_sp,1,'~'),2.);
tempsp = input(scan(uniq_p_sp,2,'~'),2.);
endcomp;
compute plot ;
plot = temppl;
endcomp;
compute subplot ;
subplot = tempsp;
endcomp;
run;

proc print data=completeset;
title 'after proc report';
title2 'need to drop those _BREAK_ lines (from compute before)';
run;

proc sort data=completeset out=work.newtrees(drop=_BREAK_);
by plot subplot TreeSpecies;
where _BREAK_ = ' ';
run;

proc print data=work.newtrees;
title 'after sort';
title2 'n=0 means not in orig file; n=1 means in orig file';
run;
[/pre]

So, take a look and see what you think. I'm such a PROC REPORT and DATA Step person, that I went down that road first. Note that the "final" vars PLOT and SUBPLOT are created in the PROC REPORT step. And the final proc sort gets rid of the _BREAK_ rows that are inserted by PROC REPORT when you do a COMPUTE BEFORE. So, you'd use WORK.NEWTREES for your further reporting -- you could keep the uniq_p_sp var or not. I just left it there.

Probably the SQL is more elegant and there is always a set theory/join method to get what you want, but for something completely different, here's a PROC REPORT solution.

cynthia
N/A
Posts: 0

Re: Creating records to represent species not encountered at plots

The updated code gives you all permutations of plot, subplot, and treespecies.

(SELECT DISTINCT treespecies FROM t) gives all treespecies
(SELECT DISTINCT plot FROM t) gives every plot
(SELECT DISTINCT subplot FROM t) gives every subplot

From your example you had 3 plots, numbered: 1,3, and 7,
4 subplots: 1, 2, 5, 6
and 5 treespecies: A, B, C, D, G
So, from your example you should have gotten 3*4*5, or 60 records, and a Y or N
for each.
which is what I get when I run the code.

The first code differed in that you got permutations by plot and subplot:
you still had 5 treespecies, but now you had 4 combinations of plot, subplot:
(1, 1),(3, 2), (7,5), (7,6) so you get 20 records (5*4) each with a Y or N

PS:
I went back and re-read your initial posting and maybe I misunderstood something. The following change will generate the species in a plot that are/arenot in a subplot:

SELECT plot, subplot, treespecies, 'N' AS encountered
FROM
(SELECT DISTINCT plot, treespecies FROM t),
(SELECT DISTINCT subplot FROM t)
;

Now I get 40 data points from your data.
Message was edited by: rwright at Feb 12, 2007 5:27 PM
Super Contributor
Posts: 260

Re: Creating records to represent species not encountered at plots

Hi all.

Here is an updated version of my tranpose program, for species as numeric codes.
Sorry Cynthia, but it seems that I have developped allergy to the Report procedure ; for SQL, it just didn't came to my mind in the first place, so I went down another road.
Every solution works, yet is different : that's why we all love SAS, don't we ?

Regards,
Olivier

DATA work.trees (DROP = record) ;
INPUT Record Plot Subplot TreeSpecies $ ;
encountered = "Y" ;
specie_name = "var_"!!treeSpecies ;
CARDS ;
1 1 1 1
2 1 1 2
3 1 1 4
4 3 2 3
5 3 2 4
6 7 5 7
7 7 6 1
8 7 6 2
9 7 6 3
10 7 6 4
;
RUN ;
PROC TRANSPOSE DATA = work.trees
OUT = work.species (DROP = _name_) ;
VAR encountered ;
ID specie_name ;
BY plot subPlot ;
RUN ;
DATA work.species ;
SET work.species ;
ARRAY species $ _CHARACTER_ ;
DO OVER species ;
IF MISSING(species) THEN species = "N" ;
END ;
RUN ;
PROC TRANSPOSE DATA = work.species
OUT = work.Encountered_and_not (RENAME = (col1 = encountered)) ;
VAR _character_ ;
BY plot subPlot ;
RUN ;
DATA work.Encountered_and_not (DROP = _name_) ;
SET work.Encountered_and_not ;
TreeSpecies = SUBSTR(_name_,5) ;
RUN ;
SAS Super FREQ
Posts: 8,744

Re: Creating records to represent species not encountered at plots

Olivier:
So sorry about your allergy to PROC REPORT! Sadly, they don't give allergy shots for PROC REPORT. ;-) The thing I love about SAS is that there's usually more than one solution to problem. And, SAS programmers are endlessly inventive!
cynthia
Ask a Question
Discussion stats
  • 12 replies
  • 276 views
  • 0 likes
  • 3 in conversation