turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- Base SAS Programming
- /
- combining levels within Variable

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2017 02:07 PM

I want to compare whether one type of tree is different from another.

Currently i have 4 tree types; I, nI, n1 and n2. I want to reduce these down to 2 types; I and N.

I also want to compare I to all others, to combine nI, n1, and n2 into a single level, N.

This is what i wrote

data AllEvents;

set import;

if Phase=1;

if Tree='I' 'nI' then 'I';

if Tree='n1' n2' then 'N';....

and SAS returned an error:

ERROR 388-185: Expecting an arithmetic operator.

ERROR 200-322: The symbol is not recognized and will be ignored.

How can i write this so that both I and nI will be read as I, and both n1 and n2 will be read as N?

Accepted Solutions

Solution

01-13-2017
04:01 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2017 03:50 PM

Replace this:

```
data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);
if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2') then Tree_New='N';
run;
```

Then in the remaining code, replace all references to Tree with Tree_New.

That's all.

All Replies

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2017 02:19 PM - edited 01-13-2017 02:20 PM

This is invalid SAS syntax:

```
if Tree='I' 'nI' then 'I';
if Tree='n1' n2' then 'N';
```

To check for multiple values in a comparison use the IN operator, ie if TREE in (value1, value2,...)

To set the new value you need to assign it to a variable, either a new variable or old variable.

Personally, I always include another value as well so you can verify your code.

It should be:

```
if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2') then Tree_New='N';
```

else Tree_New='O';

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2017 02:19 PM

If you want to write an IF statement for multiple values then

if Tree in ('I' 'nI') then variable = 'I';

However you can likely accomplish this without creating any variables or changing your data. Custom formats will group data for analysis procedures.

Proc format;

value Tree_I_N

"I", "nI" = "I"

"n1", "n2" = "N"

;

run;

proc freq data =import;

tables tree;

format tree tree_I_N.; /*<= says to group the data according the values in the format*/

run;

Create other formats. Note that the values are case sensitive so nI is not equal to NI not equal to ni

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2017 03:01 PM

A wild thought hinted by your incoming dataset name 'import'. If you want to work that out at the source level:

```
Proc format;
invalue $ Tree_I_N
"I", "nI" = "I"
"n1", "n2" = "N"
;
run;
data raw;
input tree :$tree_i_n2. @@;
cards;
I nI n1 n2
;
```

Proc Format snippet partially stolen from @ballardw .

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2017 03:07 PM

Perhaps i stated my situation incorrectly. The only numeric variable in my dataset is "Slope", while, the rest, including Tree, are categorical variables.

After my PROC IMPORT statement i have my data statement in which i have;

data AllEvents;

set import;

if Phase=1;

if Gas='Nitrous Oxide';

if slope<=0.01 and slope>=-.003;

logN2O=log(0.1+Slope);

run;

Then i have a PROC GLM statement to analyze for differences of the categorical variables on Slope, which is now named "logN2O".

If i include:

```
if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2') then Tree_New='N';
```

or

if Tree in ('I' 'nI') then variable = 'I';

in the data command, the GLM analysis still include I, nI, n1, and n2, not just I and N.

If i begin a separate PROC FORMAT command;

Proc format;

value Tree_I_N

"I", "nI" = "I"

"n1", "n2" = "N"

;

run;

I get an error: ERROR: The quoted string 'I' is not acceptable to a numeric format or informat.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2017 03:23 PM

Show all of your code and log, not piecewise.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2017 03:30 PM

This is the whole code:

%web_drop_table(WORK.IMPORT);

FILENAME FILEREF '/folders/myshortcuts/MyFolders/AllEvents.xlsx';

PROC IMPORT DATAFILE=FILEREF

DBMS=XLSX

OUT=WORK.IMPORT;

GETNAMES=YES;

RUN;

data AllEvents;

set import;

if Phase=1;

if Gas='Methane';

if slope<=0.0668 and slope>=-.03;

logCH4=log(0.1+Slope);

run;

proc sort data=AllEvents;

by Block Fert Nfix Tree Distance;

run;

/*proc print data=AllEvents;

run;*/

Proc sort;

by Fert Nfix;

run;

proc univariate Normal Plot data=AllEvents;

/*by Fert Nfix;*/

var Slope logCH4;

Histogram Slope logCH4/Normal;

QQPLOT Slope logCH4;

output out=AllEvents2;

run;

proc sort data=AllEvents;

by Gas Phase Block Fert Nfix Tree Distance;

run;

proc glm;

class Block Fert Nfix Tree Distance;

Model LogCH4=block Fert Tree Distance Block*Fert Block*Tree Fert* Tree Tree*Distance;

Test h=Fert e=Fert*Tree;

lsmeans Fert Tree Distance Tree*Distance/pdiff adjust=tukey;

run;

Solution

01-13-2017
04:01 PM

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2017 03:50 PM

Replace this:

```
data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);
if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2') then Tree_New='N';
run;
```

Then in the remaining code, replace all references to Tree with Tree_New.

That's all.

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

01-13-2017 03:44 PM - edited 01-13-2017 04:00 PM

AaronJ wrote:

Perhaps i stated my situation incorrectly. The only numeric variable in my dataset is "Slope", while, the rest, including Tree, are categorical variables.

After my PROC IMPORT statement i have my data statement in which i have;

data AllEvents;

set import;

if Phase=1;

if Gas='Nitrous Oxide';

if slope<=0.01 and slope>=-.003;

logN2O=log(0.1+Slope);

run;Then i have a PROC GLM statement to analyze for differences of the categorical variables on Slope, which is now named "logN2O".

If i include:

`if Tree IN ('I' 'nI') then Tree_New='I'; else if Tree IN ('n1' n2') then Tree_New='N';`

or

if Tree in ('I' 'nI') then variable = 'I';

in the data command, the GLM analysis still include I, nI, n1, and n2, not just I and N.

If i begin a separate PROC FORMAT command;

Proc format;

value Tree_I_N

"I", "nI" = "I"

"n1", "n2" = "N"

;

run;

I get an error: ERROR: The quoted string 'I' is not acceptable to a numeric format or informat.

Sorry, a typo, the format should have a $ preceding the name when character. $Tree_I_N and when used.

No data=> no code testing agains the data.

Proc GLM will use the grouped values if the format is applied in the procedure. Again Format Tree $Tree_I_N. ;