Solved
Contributor
Posts: 27

# combining levels within Variable

I want to compare whether one type of tree is different from another.

Currently i have 4 tree types; I, nI, n1 and n2. I want to reduce these down to 2 types; I and N.

I also want to compare I to all others, to combine nI, n1, and n2 into a single level, N.

This is what i wrote

data AllEvents;
set import;
if Phase=1;
if Tree='I' 'nI' then 'I';
if Tree='n1' n2' then 'N';....

and SAS returned an error:

ERROR 388-185: Expecting an arithmetic operator.

ERROR 200-322: The symbol is not recognized and will be ignored.

How can i write this so that both I and nI will be read as I, and both n1 and n2 will be read as N?

Accepted Solutions
Solution
‎01-13-2017 04:01 PM
Super User
Posts: 23,724

## Re: combining levels within Variable

Replace this:

``````data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);

if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2')  then Tree_New='N';

run;``````

Then in the remaining code, replace all references to Tree with Tree_New.

That's all.

All Replies
Super User
Posts: 23,724

## Re: combining levels within Variable

[ Edited ]

This is invalid SAS syntax:

``````if Tree='I' 'nI' then 'I';
if Tree='n1' n2' then 'N';``````

To check for multiple values in a comparison use the IN operator, ie if TREE in (value1, value2,...)

To set the new value you need to assign it to a variable, either a new variable or old variable.

Personally, I always include another value as well so you can verify your code.

It should be:

``````if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2')  then Tree_New='N';else Tree_New='O';``````

Super User
Posts: 13,542

## Re: combining levels within Variable

If you want to write an IF statement for multiple values then

if Tree in ('I' 'nI') then variable = 'I';

However you can likely accomplish this without creating any variables or changing your data. Custom formats will group data for analysis procedures.

Proc format;

value Tree_I_N

"I", "nI" = "I"

"n1", "n2" = "N"

;

run;

proc freq data =import;

tables tree;

format tree tree_I_N.; /*<= says to group the data according the values in the format*/

run;

Create other formats. Note that the values are case sensitive so nI is not equal to NI not equal to ni

Posts: 3,167

## Re: combining levels within Variable

A wild thought hinted by your incoming dataset name 'import'. If you want to work that out at the source level:

``````Proc format;
invalue \$ Tree_I_N
"I", "nI" = "I"
"n1", "n2" = "N"
;
run;

data raw;
input tree :\$tree_i_n2. @@;
cards;
I nI n1 n2
;``````

Proc Format snippet partially stolen from @ballardw .

Contributor
Posts: 27

## Re: combining levels within Variable

Perhaps i stated my situation incorrectly. The only numeric variable in my dataset is "Slope", while, the rest, including Tree, are categorical variables.

After my PROC IMPORT statement i have my data statement in which i have;

data AllEvents;
set import;
if Phase=1;
if Gas='Nitrous Oxide';
if slope<=0.01 and slope>=-.003;
logN2O=log(0.1+Slope);
run;

Then i have a PROC GLM statement to analyze for differences of the categorical variables on Slope, which is now named "logN2O".

If i include:

``````if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2')  then Tree_New='N';``````

or

if Tree in ('I' 'nI') then variable = 'I';

in the data command, the GLM analysis still include I, nI, n1, and n2, not just I and N.

If i begin a separate PROC FORMAT command;

Proc format;

value Tree_I_N

"I", "nI" = "I"

"n1", "n2" = "N"

;

run;

I get an error:  ERROR: The quoted string 'I' is not acceptable to a numeric format or informat.

Super User
Posts: 23,724

## Re: combining levels within Variable

Show all of your code and log, not piecewise.

Contributor
Posts: 27

## Re: combining levels within Variable

This is the whole code:

%web_drop_table(WORK.IMPORT);

FILENAME FILEREF '/folders/myshortcuts/MyFolders/AllEvents.xlsx';

PROC IMPORT DATAFILE=FILEREF
DBMS=XLSX
OUT=WORK.IMPORT;
GETNAMES=YES;
RUN;
data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);
run;
proc sort data=AllEvents;
by Block Fert Nfix Tree Distance;
run;
/*proc print data=AllEvents;
run;*/
Proc sort;
by Fert Nfix;
run;
proc univariate Normal Plot data=AllEvents;
/*by Fert Nfix;*/
var Slope logCH4;
Histogram Slope logCH4/Normal;
QQPLOT Slope logCH4;
output out=AllEvents2;
run;
proc sort data=AllEvents;
by Gas Phase Block Fert Nfix Tree Distance;
run;
proc glm;
class Block Fert Nfix Tree Distance;
Model LogCH4=block Fert Tree Distance Block*Fert Block*Tree Fert* Tree Tree*Distance;
Test h=Fert e=Fert*Tree;
lsmeans Fert Tree Distance Tree*Distance/pdiff adjust=tukey;
run;

Solution
‎01-13-2017 04:01 PM
Super User
Posts: 23,724

## Re: combining levels within Variable

Replace this:

``````data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);

if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2')  then Tree_New='N';

run;``````

Then in the remaining code, replace all references to Tree with Tree_New.

That's all.

Super User
Posts: 13,542

## Re: combining levels within Variable

[ Edited ]

AaronJ wrote:

Perhaps i stated my situation incorrectly. The only numeric variable in my dataset is "Slope", while, the rest, including Tree, are categorical variables.

After my PROC IMPORT statement i have my data statement in which i have;

data AllEvents;
set import;
if Phase=1;
if Gas='Nitrous Oxide';
if slope<=0.01 and slope>=-.003;
logN2O=log(0.1+Slope);
run;

Then i have a PROC GLM statement to analyze for differences of the categorical variables on Slope, which is now named "logN2O".

If i include:

``````if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2')  then Tree_New='N';``````

or

if Tree in ('I' 'nI') then variable = 'I';

in the data command, the GLM analysis still include I, nI, n1, and n2, not just I and N.

If i begin a separate PROC FORMAT command;

Proc format;

value Tree_I_N

"I", "nI" = "I"

"n1", "n2" = "N"

;

run;

I get an error:  ERROR: The quoted string 'I' is not acceptable to a numeric format or informat.

Sorry, a typo, the format should have a \$ preceding the name when character. \$Tree_I_N and when used.

No data=> no code testing agains the data.

Proc GLM will use the grouped values if the format is applied in the procedure. Again Format Tree \$Tree_I_N. ;

☑ This topic is solved.