I want to compare whether one type of tree is different from another.
Currently i have 4 tree types; I, nI, n1 and n2. I want to reduce these down to 2 types; I and N.
I also want to compare I to all others, to combine nI, n1, and n2 into a single level, N.
This is what i wrote
data AllEvents;
set import;
if Phase=1;
if Tree='I' 'nI' then 'I';
if Tree='n1' n2' then 'N';....
and SAS returned an error:
Replace this:
data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);
if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2') then Tree_New='N';
run;
Then in the remaining code, replace all references to Tree with Tree_New.
That's all.
This is invalid SAS syntax:
if Tree='I' 'nI' then 'I';
if Tree='n1' n2' then 'N';
To check for multiple values in a comparison use the IN operator, ie if TREE in (value1, value2,...)
To set the new value you need to assign it to a variable, either a new variable or old variable.
Personally, I always include another value as well so you can verify your code.
It should be:
if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2') then Tree_New='N';
else Tree_New='O';
If you want to write an IF statement for multiple values then
if Tree in ('I' 'nI') then variable = 'I';
However you can likely accomplish this without creating any variables or changing your data. Custom formats will group data for analysis procedures.
Proc format;
value Tree_I_N
"I", "nI" = "I"
"n1", "n2" = "N"
;
run;
proc freq data =import;
tables tree;
format tree tree_I_N.; /*<= says to group the data according the values in the format*/
run;
Create other formats. Note that the values are case sensitive so nI is not equal to NI not equal to ni
A wild thought hinted by your incoming dataset name 'import'. If you want to work that out at the source level:
Proc format;
invalue $ Tree_I_N
"I", "nI" = "I"
"n1", "n2" = "N"
;
run;
data raw;
input tree :$tree_i_n2. @@;
cards;
I nI n1 n2
;
Proc Format snippet partially stolen from @ballardw :).
Perhaps i stated my situation incorrectly. The only numeric variable in my dataset is "Slope", while, the rest, including Tree, are categorical variables.
After my PROC IMPORT statement i have my data statement in which i have;
data AllEvents;
set import;
if Phase=1;
if Gas='Nitrous Oxide';
if slope<=0.01 and slope>=-.003;
logN2O=log(0.1+Slope);
run;
Then i have a PROC GLM statement to analyze for differences of the categorical variables on Slope, which is now named "logN2O".
If i include:
if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2') then Tree_New='N';
or
if Tree in ('I' 'nI') then variable = 'I';
in the data command, the GLM analysis still include I, nI, n1, and n2, not just I and N.
If i begin a separate PROC FORMAT command;
Proc format;
value Tree_I_N
"I", "nI" = "I"
"n1", "n2" = "N"
;
run;
I get an error: ERROR: The quoted string 'I' is not acceptable to a numeric format or informat.
Show all of your code and log, not piecewise.
This is the whole code:
%web_drop_table(WORK.IMPORT);
FILENAME FILEREF '/folders/myshortcuts/MyFolders/AllEvents.xlsx';
PROC IMPORT DATAFILE=FILEREF
DBMS=XLSX
OUT=WORK.IMPORT;
GETNAMES=YES;
RUN;
data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);
run;
proc sort data=AllEvents;
by Block Fert Nfix Tree Distance;
run;
/*proc print data=AllEvents;
run;*/
Proc sort;
by Fert Nfix;
run;
proc univariate Normal Plot data=AllEvents;
/*by Fert Nfix;*/
var Slope logCH4;
Histogram Slope logCH4/Normal;
QQPLOT Slope logCH4;
output out=AllEvents2;
run;
proc sort data=AllEvents;
by Gas Phase Block Fert Nfix Tree Distance;
run;
proc glm;
class Block Fert Nfix Tree Distance;
Model LogCH4=block Fert Tree Distance Block*Fert Block*Tree Fert* Tree Tree*Distance;
Test h=Fert e=Fert*Tree;
lsmeans Fert Tree Distance Tree*Distance/pdiff adjust=tukey;
run;
Replace this:
data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);
if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2') then Tree_New='N';
run;
Then in the remaining code, replace all references to Tree with Tree_New.
That's all.
@AaronJ wrote:
Perhaps i stated my situation incorrectly. The only numeric variable in my dataset is "Slope", while, the rest, including Tree, are categorical variables.
After my PROC IMPORT statement i have my data statement in which i have;
data AllEvents;
set import;
if Phase=1;
if Gas='Nitrous Oxide';
if slope<=0.01 and slope>=-.003;
logN2O=log(0.1+Slope);
run;Then i have a PROC GLM statement to analyze for differences of the categorical variables on Slope, which is now named "logN2O".
If i include:
if Tree IN ('I' 'nI') then Tree_New='I'; else if Tree IN ('n1' n2') then Tree_New='N';
or
if Tree in ('I' 'nI') then variable = 'I';
in the data command, the GLM analysis still include I, nI, n1, and n2, not just I and N.
If i begin a separate PROC FORMAT command;
Proc format;
value Tree_I_N
"I", "nI" = "I"
"n1", "n2" = "N"
;
run;
I get an error: ERROR: The quoted string 'I' is not acceptable to a numeric format or informat.
Sorry, a typo, the format should have a $ preceding the name when character. $Tree_I_N and when used.
No data=> no code testing agains the data.
Proc GLM will use the grouped values if the format is applied in the procedure. Again Format Tree $Tree_I_N. ;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.