DATA Step, Macro, Functions and more

combining levels within Variable

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

combining levels within Variable

I want to compare whether one type of tree is different from another.

Currently i have 4 tree types; I, nI, n1 and n2. I want to reduce these down to 2 types; I and N.

I also want to compare I to all others, to combine nI, n1, and n2 into a single level, N.

This is what i wrote 

 

data AllEvents;
set import;
if Phase=1;
if Tree='I' 'nI' then 'I';
if Tree='n1' n2' then 'N';....

 

and SAS returned an error: 

ERROR 388-185: Expecting an arithmetic operator.
 
ERROR 200-322: The symbol is not recognized and will be ignored.
 
How can i write this so that both I and nI will be read as I, and both n1 and n2 will be read as N?

Accepted Solutions
Solution
‎01-13-2017 04:01 PM
Super User
Posts: 17,818

Re: combining levels within Variable

Replace this:

data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);

if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2')  then Tree_New='N';

run;

Then in the remaining code, replace all references to Tree with Tree_New.

That's all.

View solution in original post


All Replies
Super User
Posts: 17,818

Re: combining levels within Variable

[ Edited ]

This is invalid SAS syntax:

if Tree='I' 'nI' then 'I';
if Tree='n1' n2' then 'N';

To check for multiple values in a comparison use the IN operator, ie if TREE in (value1, value2,...)

To set the new value you need to assign it to a variable, either a new variable or old variable.

Personally, I always include another value as well so you can verify your code.


It should be:

 

if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2')  then Tree_New='N';
else Tree_New='O';

 

 

Super User
Posts: 10,497

Re: combining levels within Variable

If you want to write an IF statement for multiple values then

if Tree in ('I' 'nI') then variable = 'I';

 

However you can likely accomplish this without creating any variables or changing your data. Custom formats will group data for analysis procedures.

 

Proc format;

value Tree_I_N

"I", "nI" = "I"

"n1", "n2" = "N"

;

run;

 

proc freq data =import;

   tables tree;

   format tree tree_I_N.; /*<= says to group the data according the values in the format*/

run;

 

Create other formats. Note that the values are case sensitive so nI is not equal to NI not equal to ni

Respected Advisor
Posts: 3,124

Re: combining levels within Variable

A wild thought hinted by your incoming dataset name 'import'. If you want to work that out at the source level:

Proc format;
invalue $ Tree_I_N
"I", "nI" = "I"
"n1", "n2" = "N"
;
run;


data raw;
input tree :$tree_i_n2. @@;
cards;
I nI n1 n2
;

 

Proc Format snippet partially stolen from @ballardw Smiley Happy.

Occasional Contributor
Posts: 14

Re: combining levels within Variable

Perhaps i stated my situation incorrectly. The only numeric variable in my dataset is "Slope", while, the rest, including Tree, are categorical variables.

After my PROC IMPORT statement i have my data statement in which i have;

data AllEvents;
set import;
if Phase=1;
if Gas='Nitrous Oxide';
if slope<=0.01 and slope>=-.003;
logN2O=log(0.1+Slope);
run;

Then i have a PROC GLM statement to analyze for differences of the categorical variables on Slope, which is now named "logN2O".

If i include:

if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2')  then Tree_New='N';

 or 

if Tree in ('I' 'nI') then variable = 'I';

in the data command, the GLM analysis still include I, nI, n1, and n2, not just I and N.

 

If i begin a separate PROC FORMAT command;

Proc format;

value Tree_I_N

"I", "nI" = "I"

"n1", "n2" = "N"

;

run;

I get an error:  ERROR: The quoted string 'I' is not acceptable to a numeric format or informat.

 

Super User
Posts: 17,818

Re: combining levels within Variable

Show all of your code and log, not piecewise. 

Occasional Contributor
Posts: 14

Re: combining levels within Variable

This is the whole code:

 

%web_drop_table(WORK.IMPORT);


FILENAME FILEREF '/folders/myshortcuts/MyFolders/AllEvents.xlsx';

PROC IMPORT DATAFILE=FILEREF
DBMS=XLSX
OUT=WORK.IMPORT;
GETNAMES=YES;
RUN;
data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);
run;
proc sort data=AllEvents;
by Block Fert Nfix Tree Distance;
run;
/*proc print data=AllEvents;
run;*/
Proc sort;
by Fert Nfix;
run;
proc univariate Normal Plot data=AllEvents;
/*by Fert Nfix;*/
var Slope logCH4;
Histogram Slope logCH4/Normal;
QQPLOT Slope logCH4;
output out=AllEvents2;
run;
proc sort data=AllEvents;
by Gas Phase Block Fert Nfix Tree Distance;
run;
proc glm;
class Block Fert Nfix Tree Distance;
Model LogCH4=block Fert Tree Distance Block*Fert Block*Tree Fert* Tree Tree*Distance;
Test h=Fert e=Fert*Tree;
lsmeans Fert Tree Distance Tree*Distance/pdiff adjust=tukey;
run;

Solution
‎01-13-2017 04:01 PM
Super User
Posts: 17,818

Re: combining levels within Variable

Replace this:

data AllEvents;
set import;
if Phase=1;
if Gas='Methane';
if slope<=0.0668 and slope>=-.03;
logCH4=log(0.1+Slope);

if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2')  then Tree_New='N';

run;

Then in the remaining code, replace all references to Tree with Tree_New.

That's all.

Super User
Posts: 10,497

Re: combining levels within Variable

[ Edited ]

 


AaronJ wrote:

Perhaps i stated my situation incorrectly. The only numeric variable in my dataset is "Slope", while, the rest, including Tree, are categorical variables.

After my PROC IMPORT statement i have my data statement in which i have;

data AllEvents;
set import;
if Phase=1;
if Gas='Nitrous Oxide';
if slope<=0.01 and slope>=-.003;
logN2O=log(0.1+Slope);
run;

Then i have a PROC GLM statement to analyze for differences of the categorical variables on Slope, which is now named "logN2O".

If i include:

if Tree IN ('I' 'nI') then Tree_New='I';
else if Tree IN ('n1' n2')  then Tree_New='N';

 or 

if Tree in ('I' 'nI') then variable = 'I';

in the data command, the GLM analysis still include I, nI, n1, and n2, not just I and N.

 

If i begin a separate PROC FORMAT command;

Proc format;

value Tree_I_N

"I", "nI" = "I"

"n1", "n2" = "N"

;

run;

I get an error:  ERROR: The quoted string 'I' is not acceptable to a numeric format or informat.

 



Sorry, a typo, the format should have a $ preceding the name when character. $Tree_I_N and when used.

No data=> no code testing agains the data.

 

Proc GLM will use the grouped values if the format is applied in the procedure. Again Format Tree $Tree_I_N. ;

 

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 211 views
  • 5 likes
  • 4 in conversation