I want to compare whether one type of tree is different from another.
Currently i have 4 tree types; I, nI, n1 and n2. I want to reduce these down to 2 types; I and N.
I also want to compare I to all others, to combine nI, n1, and n2 into a single level, N.
This is what i wrote
data AllEvents;
set import;
if Phase=1;
if Tree='I' 'nI' then 'I';
if Tree='n1' n2' then 'N';....
and SAS returned an error:
Replace this:
data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);
if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2') then Tree_New='N';
run;
Then in the remaining code, replace all references to Tree with Tree_New.
That's all.
This is invalid SAS syntax:
if Tree='I' 'nI' then 'I';
if Tree='n1' n2' then 'N';
To check for multiple values in a comparison use the IN operator, ie if TREE in (value1, value2,...)
To set the new value you need to assign it to a variable, either a new variable or old variable.
Personally, I always include another value as well so you can verify your code.
It should be:
if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2') then Tree_New='N';
else Tree_New='O';
If you want to write an IF statement for multiple values then
if Tree in ('I' 'nI') then variable = 'I';
However you can likely accomplish this without creating any variables or changing your data. Custom formats will group data for analysis procedures.
Proc format;
value Tree_I_N
"I", "nI" = "I"
"n1", "n2" = "N"
;
run;
proc freq data =import;
tables tree;
format tree tree_I_N.; /*<= says to group the data according the values in the format*/
run;
Create other formats. Note that the values are case sensitive so nI is not equal to NI not equal to ni
A wild thought hinted by your incoming dataset name 'import'. If you want to work that out at the source level:
Proc format;
invalue $ Tree_I_N
"I", "nI" = "I"
"n1", "n2" = "N"
;
run;
data raw;
input tree :$tree_i_n2. @@;
cards;
I nI n1 n2
;
Proc Format snippet partially stolen from @ballardw :).
Perhaps i stated my situation incorrectly. The only numeric variable in my dataset is "Slope", while, the rest, including Tree, are categorical variables.
After my PROC IMPORT statement i have my data statement in which i have;
data AllEvents;
set import;
if Phase=1;
if Gas='Nitrous Oxide';
if slope<=0.01 and slope>=-.003;
logN2O=log(0.1+Slope);
run;
Then i have a PROC GLM statement to analyze for differences of the categorical variables on Slope, which is now named "logN2O".
If i include:
if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2') then Tree_New='N';
or
if Tree in ('I' 'nI') then variable = 'I';
in the data command, the GLM analysis still include I, nI, n1, and n2, not just I and N.
If i begin a separate PROC FORMAT command;
Proc format;
value Tree_I_N
"I", "nI" = "I"
"n1", "n2" = "N"
;
run;
I get an error: ERROR: The quoted string 'I' is not acceptable to a numeric format or informat.
Show all of your code and log, not piecewise.
This is the whole code:
%web_drop_table(WORK.IMPORT);
FILENAME FILEREF '/folders/myshortcuts/MyFolders/AllEvents.xlsx';
PROC IMPORT DATAFILE=FILEREF
DBMS=XLSX
OUT=WORK.IMPORT;
GETNAMES=YES;
RUN;
data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);
run;
proc sort data=AllEvents;
by Block Fert Nfix Tree Distance;
run;
/*proc print data=AllEvents;
run;*/
Proc sort;
by Fert Nfix;
run;
proc univariate Normal Plot data=AllEvents;
/*by Fert Nfix;*/
var Slope logCH4;
Histogram Slope logCH4/Normal;
QQPLOT Slope logCH4;
output out=AllEvents2;
run;
proc sort data=AllEvents;
by Gas Phase Block Fert Nfix Tree Distance;
run;
proc glm;
class Block Fert Nfix Tree Distance;
Model LogCH4=block Fert Tree Distance Block*Fert Block*Tree Fert* Tree Tree*Distance;
Test h=Fert e=Fert*Tree;
lsmeans Fert Tree Distance Tree*Distance/pdiff adjust=tukey;
run;
Replace this:
data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);
if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2') then Tree_New='N';
run;
Then in the remaining code, replace all references to Tree with Tree_New.
That's all.
@AaronJ wrote:
Perhaps i stated my situation incorrectly. The only numeric variable in my dataset is "Slope", while, the rest, including Tree, are categorical variables.
After my PROC IMPORT statement i have my data statement in which i have;
data AllEvents;
set import;
if Phase=1;
if Gas='Nitrous Oxide';
if slope<=0.01 and slope>=-.003;
logN2O=log(0.1+Slope);
run;Then i have a PROC GLM statement to analyze for differences of the categorical variables on Slope, which is now named "logN2O".
If i include:
if Tree IN ('I' 'nI') then Tree_New='I'; else if Tree IN ('n1' n2') then Tree_New='N';
or
if Tree in ('I' 'nI') then variable = 'I';
in the data command, the GLM analysis still include I, nI, n1, and n2, not just I and N.
If i begin a separate PROC FORMAT command;
Proc format;
value Tree_I_N
"I", "nI" = "I"
"n1", "n2" = "N"
;
run;
I get an error: ERROR: The quoted string 'I' is not acceptable to a numeric format or informat.
Sorry, a typo, the format should have a $ preceding the name when character. $Tree_I_N and when used.
No data=> no code testing agains the data.
Proc GLM will use the grouped values if the format is applied in the procedure. Again Format Tree $Tree_I_N. ;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.