BookmarkSubscribeRSS Feed
AnnaBrown
Community Manager

Hi there,

 

So often this question comes up on the communities...and here's a tutorial video about how to convert character to numeric, and vice versa, in SAS.

 

 

 

22 REPLIES 22
hashman
Ammonite | Level 13

@AnnaBrown:

A very nice little lecture! Just a few (well-intended) notes:

  • You mentioned that instead of creating a new data set with the new type-converted column, you can add this column to the original data set. That would be nice. Problem is, you can only add rows to an existing data set (via APPEND) but not columns; to achieve this effect, you have to completely rewrite the whole file with the new variable added, which for large files can be quite taxing. However, the effect of adding a column can be emulated by creating a view (with the same conversion code you're showing) and referencing it down the line instead of the existing data set. This way, the wolves are no longer hungry (you got a new column), and the sheep are still alive (the original data set is not rewritten, and no disk space is wasted).   
  • The way you code CATX, it creates the concatenated column with the default length of 200, which is rather unwieldy for this kind of result and just gobbles up disk space. This goes unnoticed for the audience because you don't show the SAS log; otherwise, it would see the corresponding log note. 
  • IMO, in teaching SAS showing the log is critical. The very first thing one routinely does after running a SAS program is examining the log for errors, warnings, notes, the number of records read and written, etc.; and only if everything is hunky-dory in the log department, going on to eyeball the output.   

Kind regards

Paul D.

ChrisNZ
Tourmaline | Level 20

@hashman I like the wolf and sheep idiom! Where does it originate?

Maybe the English version would be have your cake and eat it too? Or maybe this alteration of the famous adage: "no pain still gain"

hashman
Ammonite | Level 13

@ChrisNZ;

To the best of my knowledge, the origin is Russian: "И волки сыты, и овцы целы" (lit: Both the wolves are sated and the sheep are intact). 

Yup, "have the cake and eat it, too" is the closest I can think of. 

AnnaBrown
Community Manager

You all make me laugh, @hashman and @ChrisNZ! I'm adding @Anna_Yarbrough (so many "Annas") to this discussion as she can speak to the areas you pointed out, @hashman. Thanks for viewing and sharing your input!

 

Anna 

yabwon
Onyx | Level 15

We have the same one in Polish: "Wilk syty i owca cała", but I'm not sure whether it is "native" or "imported" 🙂

 

In practice: https://desa.pl/media/img/cms/auction_objects/39623/582d1e9e618bc22267e685a346b2d0a8.jpg

 

B-)

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



Anna_Yarbrough
SAS Employee

Hi Paul D.,


Thanks for taking the time to watch and comment on this video! A few comments in response.

 

Creating a View instead of Rewriting our Data:

Great point about creating a view! You can do this with PROC SQL or the DATA step. To do it with the DATA step just specify the VIEW option on the DATA statement. Here is an example:

 

data race_new / view=race_new;
	set race(rename=(race_date=race_date_old));
	race_date=input(race_date_old, date9.);
	format race_date mmddyy10.;
	drop race_date_old;
run;

 

To look at the stored code in your view later, you can run a describe on the view:

data view=race_new;
describe;
run;

CATX and the length of NC_ID:

As you mentioned, the use of CATX here does create a column with a default length of 200 characters. If you run the code, there are actually no notes in the SAS log about this; however, doing a PROC CONTENTS on the newly created data set it is apparent, so let’s talk about how we can fix it. One easy fix is to use the length statement in the DATA step before the creation of the column. I can put this right after the SET statement, or if I want to maintain the order of my columns in my data set, I can put it right before I define the NC_ID column. Here is what that would look like:

length NC_ID $10.;

 

Showing the Log:

I agree, it is important to check out the log. I’ll be sure to highlight this more in future tutorials!

 

Again, thanks for taking the time to check out this tutorial. I hope you spend some more time exploring the other tutorials on the SAS Users YouTube channel! 

hashman
Ammonite | Level 13

@Anna_Yarbrough:

Thanks for your reply. I'd add a couple of things:

1. Declaring NC_ID in the LENGTH statement beforehand is one way. A FORMAT statement with $w. and appropriately selected W work, too. The latter has the advantage in that it can be applied directly to the result of a CAT* function without a separate line of code:

NC_ID = put (CATX (<arguments>), $10.) ;

The same trick can be used with any other character function returning $200 by default, such as REPEAT, IFC, etc. However, if the result is expected to be longer than 200, sizing the receiving variable beforehand is imperative since otherwise the function's buffer will be limited to 200, while if the variable is already sized, the function uses its length as its buffer length up to 32767.

 

2. "If you run the code, there are actually no notes in the SAS log about this".

If I run the code, I do get the note because I always have the system option:

option msglevel=i ;

turned on. If you include it, you'll also get a log message like:

INFO: Character variables have defaulted to a length of 200 at the places given by:
      (Line):(Column). Truncation can result.
      4:3   NC_ID

In the real world, one shouldn't miss messages of this nature and correct one's code correspondingly if any are seen in the log. This is why I never run SAS without the option MSGLEVEL=I turned on and hence simply include it at the invocation or the config or autoexec file. Having it on also tells many other useful things, such as whether an index is defined; which access methods and execution plan SQL uses when you code the _METHOD and _TREE option with the SQL statement, and many other useful things. 

 

Kind regards

Paul D. 

   

  

 

ScottBass
Rhodochrosite | Level 12

Full disclosure:  I haven't watched the video.

 

However, this subject matter really confused me when I first started SAS:

 

  • Source Variable:  either numeric or character
  • Target Variable:  either numeric or character
  • Choice of Function:  either PUT() or INPUT()
  • Choice of Format:  either numeric or character format/informat

 

That's 2**4 possible combinations.  Sheesh, no wonder I was confused 🙂

 

However, these rules below clarified it for me:

 

  • PUT function:  ALWAYS returns character output, accepts either character or numeric input.
  • INPUT function:  ALWAYS expects character input, returns either character or numeric output.

 

Once I learned these rules, then it's just a matter of matching the format/informat type to the input/output:

 

  • PUT function, character input, (always character output):  Use a character format.
  • PUT function, numeric input, (always character output):  Use a numeric format.
  • INPUT function, (always character input), character output:  Use a character informat.
  • INPUT function, (always character input), numeric output:  Use a numeric informat.

 

Finally, I always remember that, if I want a number as a returned value, use INPUT.  That also means if I'm using a user defined format (i.e. PROC FORMAT) I need to create a numeric INFORMAT.  I've seen many a code sample where the end user created a numeric FORMAT, then used nested INPUT(PUT(...),...) to get the desired result.

 

 


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
hashman
Ammonite | Level 13

@ScottBass:

Thanks, Scott. A fantastic recap on one of the SAS topics that confuse tyros (and not only tyros) the most.

Your list of the possible combinations also explains why there's no in/format that would have both numeric argument and numeric response - there's no combination of numeric input and numeric output for that.

Speculating on the topic still further, I'd say that no piece of data enters the SAS System without an informat and leaves it without a format, respectively, no matter whether the in/format is used implicitly or explicitly.

 

Kind regards

Paul D.   

yabwon
Onyx | Level 15

Hi @ScottBass 

 

there is one more "funny" question, that was always bugging me. 

Character format and character informat - why two of them, is there any difference I'm missing that we need them?

 

All the bets

Bart

 

proc format;
  invalue $ abc
  "A"="a"
  "B"='b'
   C = c
  other = "_"
  ;

    value $ abc
  "A"="a"
  "B"='b'
   C = c
  other = "_"
  ;
run;


data Test;
  do letter = "A","B","C","D";
    f=  put(letter,$abc.);
    i=input(letter,$abc.);
    output;
  end;
run;
_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



ScottBass
Rhodochrosite | Level 12

@yabwon wrote:

Hi @ScottBass 

 

there is one more "funny" question, that was always bugging me. 

Character format and character informat - why two of them, is there any difference I'm missing that we need them?

 

All the bets

Bart

 

proc format;
  invalue $ abc
  "A"="a"
  "B"='b'
   C = c
  other = "_"
  ;

    value $ abc
  "A"="a"
  "B"='b'
   C = c
  other = "_"
  ;
run;


data Test;
  do letter = "A","B","C","D";
    f=  put(letter,$abc.);
    i=input(letter,$abc.);
    output;
  end;
run;

 

Hi @yabwon,

 

  • Character format:  expects character input, always returns character output
  • Character informat:  always expects character input, returns character output

So sure, if you have character input and are desiring character output, then either a character format or informat will work (i.e. give you the desired results). 

 

You might get different results between the two if you munged the input/output data types, i.e. character format assigning to a numeric variable, or character informat with numeric input.  Sometimes SAS will do the implicit type conversion, and sometimes it generates an error, depending on the combination of function, in/format, input and output variable type.  I myself never like implicit type conversion, and I HATE errors :).  So, I always try to explicitly convert the type myself via INPUT/PUT.  Note that unnecessary type conversions (i.e. using a format when you should use an informat to return a numeric value) will affect performance.

 

However, character informats (INVALUE statement) have additional options, such as JUST and UPCASE:

 

* character format:   expects character input, always returns character output ;
* character informat: always expects character input, returns character output ;

proc format;
   * character informat, expects character input, always returns character output ;
   invalue $ abc
   "A"="a"
   "B"='b'
   C = c
   other = "_"
   ;

   * character format, always expects character input, returns character output ;
   value $ abc
   "A"="a"
   "B"='b'
   C = c
   other = "_"
   ;

   * testing: do quotes even matter? (nope, and is documented);
   invalue $ noq
   A=a
   B=b
   C=c
   other=_
   ;

   value $ noq
   A=a
   B=b
   C=c
   other=_
   ;

   * informats also have the upcase and just option ;
   invalue $just (just)
   A=X
   B=Y
   C=Z
   other=_
   ;

   invalue $upc (upcase)
   A=X
   B=Y
   C=Z
   other=_
   ;

   invalue $justupc (just upcase)
   A=X
   B=Y
   C=Z
   other=_
   ;

   %macro comment;
   value $upc (upcase)  /* but not formats, which makes sense */
   A=a
   B=b
   C=c
   other=_
   ;
   %mend comment;  * comment out using uncalled macro ;

   * a numeric informat ;
   invalue num 
   1=1
   2=2
   3=3
   other=.Z
   ;
run;


data Test1;
   do letter = "A","B","C","D";
      f=  put(letter,$abc.);
      i=input(letter,$abc.);
      output;
   end;
run;

data Test2;
   do letter = "A","B","C","D";
      f=  put(letter,$noq.);
      i=input(letter,$noq.);
      output;
   end;
run;

data Test3A;  * justify ;
   length letter $5;
   do letter = "A"," B","  C","   D";
      i=input(letter,$just5.);
      output;
   end;
run;

data Test3B;  * upcase ;
   length letter $5;
   do letter = "a","b","c","d";
      i=input(letter,$upc5.);
      output;
   end;
run;

data Test3C;  * justify+upcase ;
   length letter $5;
   do letter = "a"," b","  c","   d";
      i=input(letter,$justupc5.);
      output;
   end;
run;

data Test4;
   do num=1 to 4;  
      i=input(num,num.);  * implicit type conversion for the input function ;
      i2=input(put(num,best.),num.);  * explicit type conversion for the input function ;
      output;
   end;
run;

 

Getting the Test3's to give the correct results was tricky; I finally remembered to add a length suffix to the informats, since the default length is based on the input values (with a little help from Google and http://support.sas.com/resources/papers/proceedings10/022-2010.pdf)

 

As you noted in your code, the quotes are optional.  From the doc:

 

https://documentation.sas.com/?docsetId=proc&docsetTarget=p1pmw90bl3jzgdn1w4202kclxtho.htm&docsetVer...

 

If you omit the single or double quotation marks around character-string, then the INVALUE statement assumes that the quotation marks are there.

 

The VALUE statement says:

 

You must enclose a formatted value in single or double quotation marks. The following example shows a formatted value that is enclosed in double quotation marks

 

But the above shows that is not always the case.  I thought they would be required if the label contains spaces, but that does not appear to be the case:

 

proc format;
   * testing: do quotes even matter? ;
   invalue $ noq
   A=a a a
   B=b
   C=c
   other=_
   ;

   value $ noq
   A=a
   B=b b b 
   C=c
   other=_
   ;
run;

data Test2;
   do letter = "A","B","C","D";
      f=  put(letter,$noq.);
      i=input(letter,$noq.);
      output;
   end;
run;

The above appears to work, but doesn't give the correct results if you do:

 

proc format;
   value $ noq
   A=a
   B=b b b 
   C C C=c
   other=_
   ;
quit;

Best practice is to follow the SAS documentation regarding quotation marks.

 

 

HTH...


Please post your question as a self-contained data step in the form of "have" (source) and "want" (desired results).
I won't contribute to your post if I can't cut-and-paste your syntactically correct code into SAS.
yabwon
Onyx | Level 15

Hi @ScottBass ,

 

Scott thanks for your response, it definitely helped! Thank you for reminding me that JUST and UPCASE are only available in informats, I should checked out the doc instead my memory... 🙂

 

Few thoughts: 

a) You wrote:

I myself never like implicit type conversion, and I HATE errors.  So, I always try to explicitly convert the type myself via INPUT/PUT.

I couldn't agree more. Doing implicit conversion is very hmmm....  "un-elegant". I have my small "metaphor" about such programming practice but it is not suitable for "public debate" 😉 maybe if we'll have chance to talk face to face I share 🙂

 

b) Thanks for Ron Cody's article, I'll read it for sure.

 

c) I also agree about quotes, true I used one without quotes in my example, but in practice I think quoting is the best practice. The last example you shared:

proc format;
   value $ noq
   A=a
   B=b b b 
   C C C=c
   other=_
   ;
quit;

 that behaviour looks like the Title's behaviour, you could do it:

title "A b c d" C=red " e f g";
proc print data = sashelp.class(obs=1);
run;

title A b c d C=red e f g ;
proc print data = sashelp.class(obs=1);
run;

Both "works" but the first one is 1) clearer 2) "warningless".

 

Thanks again!

All the best

Bart

_______________
Polish SAS Users Group: www.polsug.com and communities.sas.com/polsug

"SAS Packages: the way to share" at SGF2020 Proceedings (the latest version), GitHub Repository, and YouTube Video.
Hands-on-Workshop: "Share your code with SAS Packages"
"My First SAS Package: A How-To" at SGF2021 Proceedings

SAS Ballot Ideas: one: SPF in SAS, two, and three
SAS Documentation



AndreasF
Fluorite | Level 6

Hello there!

 

Thanks for the video and the simple explanation for put and input. Well, "simple" ... As simple as it can get with SAS, let's say.

All those puts and inputs, date9s, z3.s are pretty much why starters are so unhappy with SAS. Any intutive way to understand that? Any experience with other programming languages which you could transfer and use? Tbh, I don't see it and I understand some people in my department, who are using SAS only to download data, put it (or input it, lol) in Excel or R and then go on from there. The learning curve in SAS is just very slow and often frustrating.

 

ChrisNZ
Tourmaline | Level 20

R easier to learn than SAS? You had a hell of a good teacher!!

 

The numeric formats date9. and z4. are way to format numbers into strings.

Is that number to be displayed as a hex string? a date? a time? how many decimals do you want? or leading zeros? or percent or dollar sign?

 

The numeric informats date9. etc allow the reverse: read a string as a (into a) number.

If that string a date? a time? in what format? and so on

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 22 replies
  • 15474 views
  • 72 likes
  • 12 in conversation