New SAS User

Completely new to SAS or trying something new with SAS? Post here for help getting started.
BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
moni2
Calcite | Level 5

Hello,

I am trying to convert a variable from a character to a numeric variable. 

The original character variable has numeric values, special symbols ":, *), numeric values with symbols ">3, -3, >=10", and text such as "see comment" and " ____MARKED". Ultimately, I want to convert this to a numeric variable because I will need to do summary statistics on it later. However, I do need to keep the "-3" the other values such as ">3" can become "3". 

 

I wrote the following program:

DATA L1;
LENGTH RV $ 100; 
SET Labs;
RESULT_VALUE = COMPRESS(RESULT_VALUE,'<> >=+'); 
IF RESULT_VALUE IN('' ':') THEN RV = ''; 
ELSE RV=RESULT_VALUE;
IF anyalpha(RV) THEN DELETE;
ELSE RV = INPUT(RV, 12.); 
RUN;

I was able to successfully remove the text values and symbols, but, "-3" got converted to "3" and the variable is still character. Additionally, on doing a subsequent  PROC FREQ it the original variable "result_value" is missing its original text and special values which I thought should not happen. 

 

Could you please help me understand how to do this differently?

 

Thank you. 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

I would use TRANSLATE() instead of COMPRESS() to get rid of the characters you don't want.  That will reduce the risk of forming a number from separate digit strings in the string.  But probably remove the commas so strings like 1,234,567 will be recognized.

 

data have;
  input string $32. ;
cards;
2.8
25
Slight
>=60
0.3
4.56E2
Moderate
37
Marked
7.0000000000000007-2
1,230
+45
-3
100-200
34>56
;

data want;
  set have;
  clean=left(translate(compress(string,','),' ',compress(string,'+-. E','d')));
  number=input(clean,??32.);
run;

proc print;
run;

Result

Obs    string                  clean                    number

  1    2.8                     2.8                         2.8
  2    25                      25                         25.0
  3    Slight                                               .
  4    >=60                    60                         60.0
  5    0.3                     0.3                         0.3
  6    4.56E2                  4.56E2                    456.0
  7    Moderate                                             .
  8    37                      37                         37.0
  9    Marked                                               .
 10    7.0000000000000007-2    7.0000000000000007-2         .
 11    1,230                   1230                     1230.0
 12    +45                     +45                        45.0
 13    -3                      -3                         -3.0
 14    100-200                 100-200                      .
 15    34>56                   34 56                        .

 

 

 

View solution in original post

6 REPLIES 6
PaigeMiller
Diamond | Level 26

@moni2 wrote:

The original character variable has numeric values, special symbols ":, *), numeric values with symbols ">3, -3, >=10", and text such as "see comment" and " ____MARKED". Ultimately, I want to convert this to a numeric variable because I will need to do summary statistics on it later. However, I do need to keep the "-3" the other values such as ">3" can become "3". 


I think we would need a complete set of rules about what to do with non-numeric characters that are found in the data. The above discusses just one such rule.

--
Paige Miller
HB
Barite | Level 11 HB
Barite | Level 11

Add some sample data in a working data step along with desired results

ballardw
Super User

First, this makes no sense:

ELSE RV = INPUT(RV, 12.); 

because you have defined RV to be a 100 length character variable. So you are going to input something, that may actually be numeric in appearance and then use internal default conversions to make it fit into a character value.

 

Your -3 disappearing is likely because of the COMPRESS you have chosen. You are removing the SPACE character. So if there is anything before the -3 (or similar) then  "a    -3" after compress becomes "a-3" which will not read as numeric with a 12. informat.

So look at the full value of variable with the disappearing -3. Subtle issues might be the presence of TAB or NULL characters that appear to be spaces but are not actually.

 

I might consider, lacking anything resembling many examples

RESULT_VALUE = COMPRESS(RESULT_VALUE,'0123456789-.','k'); 

to KEEP digits, the - and . (assumes some decimal values) and discard everything else.

 

 

moni2
Calcite | Level 5

thank you to everyone who offered suggestions; 

 

Adding the " RESULT_VALUE = COMPRESS(RESULT_VALUE,'0123456789-.','k'); " was helpful but then it created a value which was not there before "7.0000000000000007-2" in either then old or the new variable and the old and the text values from the old variable are missing. 

 

 Here is a sample of the values in the variable for those who asked:

RESULT_VALUE
2.8
8.5
8.3
25
26.3
33.4
Slight
140
138
248
0.96
9.4
78.6
15.3
43.8
100
20
14.6
4.8
0
19
>=60
5.57
0.3
59
15
3.7
0.2
2
16.5
47.1
0.5
79.9
15.4
0.1
75.5
4
Moderate
2
3.41
0.3
Slight
37
Marked
Moderate
27.9
2

 

Again, I don't need the text values in my new variable, "-3" should be preserved, and "=>60" can be just 60. And ultimately, I want to create a new numeric variable free from text. 
thank you everyone for your patience.
HB
Barite | Level 11 HB
Barite | Level 11

What is the numeric value for "Moderate"?

Tom
Super User Tom
Super User

I would use TRANSLATE() instead of COMPRESS() to get rid of the characters you don't want.  That will reduce the risk of forming a number from separate digit strings in the string.  But probably remove the commas so strings like 1,234,567 will be recognized.

 

data have;
  input string $32. ;
cards;
2.8
25
Slight
>=60
0.3
4.56E2
Moderate
37
Marked
7.0000000000000007-2
1,230
+45
-3
100-200
34>56
;

data want;
  set have;
  clean=left(translate(compress(string,','),' ',compress(string,'+-. E','d')));
  number=input(clean,??32.);
run;

proc print;
run;

Result

Obs    string                  clean                    number

  1    2.8                     2.8                         2.8
  2    25                      25                         25.0
  3    Slight                                               .
  4    >=60                    60                         60.0
  5    0.3                     0.3                         0.3
  6    4.56E2                  4.56E2                    456.0
  7    Moderate                                             .
  8    37                      37                         37.0
  9    Marked                                               .
 10    7.0000000000000007-2    7.0000000000000007-2         .
 11    1,230                   1230                     1230.0
 12    +45                     +45                        45.0
 13    -3                      -3                         -3.0
 14    100-200                 100-200                      .
 15    34>56                   34 56                        .

 

 

 

sas-innovate-white.png

Our biggest data and AI event of the year.

Don’t miss the livestream kicking off May 7. It’s free. It’s easy. And it’s the best seat in the house.

Join us virtually with our complimentary SAS Innovate Digital Pass. Watch live or on-demand in multiple languages, with translations available to help you get the most out of every session.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 6 replies
  • 2565 views
  • 0 likes
  • 5 in conversation