DATA Step, Macro, Functions and more

grouping numeric AND character

Posts: 0

grouping numeric AND character


I want to group observations using a character variable but that contains values that might be interpreted like a numeric one. Here is an example :

I want to group observations such that :

Obs1 101
Obs2 A1
Obs3 102
Obs4 A2


Obs1 G1
Obs2 G1
Obs3 G2
Obs4 G2

where the first data step variable is a character one.

My code is :

data mytable;
set mytable;
if var1 = ("101" or "A1") then var2 = "G1";
else var2 = G2;

But actually SAS converts the 101 and 102 from character to numeric ("Note : Character values have been converted to numeric values to the giver by...") and then returns an error when it encounters the "real" character values ("invalid numeric data, 'A1', at the place given by...") !

I need your help to fix that problem !
Thank you
Frequent Contributor
Posts: 81

Re: grouping numeric AND character

Use the below statements:

if var1 in ("101" , "A1") then var2 = "G1";
else var2 = "G2";
Posts: 0

Re: grouping numeric AND character

Thanks a lot, it works ! I'm surprised how the syntax can affect the result though. I have spend hours trying to understand the cause...
Ty again
Posts: 8,743

Re: grouping numeric AND character

The cause of the error message has to do with what your IF statement was really trying to do. Even though you see that VAR1 contains a mix of numbers and characters, in the PROC PRINT output, VAR1 is a CHARACTER variable...has to be...SAS only has 2 types for variables: character and numeric. If VAR1 were a numeric variable, you would never see 'A1' or 'A2' in the PROC PRINT of the data -- they are invalid numeric values. Another clue about whether a variable is character or numeric (internally as stored) is that in a default PROC PRINT, character variables are LEFT justified and numeric variables are RIGHT justified.

Variables have NAMES and they have VALUES. The NAME is how you reference the variable. The VALUE is what is stored "inside" the variable in a place in memory referenced by the variable NAME.

So, when you did this:
if var1 = ("101" or "A1") then var2 = "G1";
else var2 = G2;

you had 2 mistakes:
1) the syntax of the IF statement was incorrect. The only correct syntax was:
if var1 = "101" or var1 = "A1" then var2 = "G1";
< more code >
2) using the IN operator as shown

Next, you had:
else var2=G2;
G2 is not the variable NAME. Without quotes around G2 to indicate that it is a character constant, SAS was forced to conclude that you were referencing a -variable- named G2 -- which by default, it thought was a numeric variable. When you want to assign the variable -named- VAR2 to have the -value- "G2", then you must put the character string/constant in quotes.

Or, you could have done this:
temp = "G2";
var2 = temp;

which is unnecessary, when this:
is what you meant to do. Life is easier when you want to assign a number to a new variable. Numeric constants are unquoted. if you wanted to create var3 as a number, you could have done:
if var1 in ('101', 'A1') then var3 = 10;
else var3 = 11;

However, another technique that makes it unnecessary to even code an IF statement at all is to create a user-defined format that contains the "recoding" that you want to have happen. You could internally store the var1 values as 101, A1, A2, etc and then just DISPLAY them at G1 or G2. OR, you can create a new variable from using the format with a PUT statement. As shown below in #2.

data mytable;
infile datalines;
input id var1 $;
11 101
12 A1
13 102
14 A2
15 102
16 A1
17 A3

proc format;
value $mygrp 'A1', '101' = 'G1'
'A2' , '102' = 'G2'
other = 'UN';

ods listing;
proc freq data=mytable;
title 'Use Format without making new variable';
tables var1;
format var1 $mygrp.;

proc print data=mytable;
title 'Format with PROC PRINT';
format var1 $mygrp.;

** Make new variable;
data newvar;
set mytable;
newvar = put(var1,$mygrp.);

proc print data=newvar;
title 'PRINT showing new var created';

Ask a Question
Discussion stats
  • 3 replies
  • 3 in conversation