Is there a better way to convert variables from character to numeric and vice versa?

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 88
Accepted Solution

Is there a better way to convert variables from character to numeric and vice versa?

I read on one of these forums that one way to convert character variables to numeric is to do the following code:

 

data bndados;
	set bndados
		(rename=(MED_REC_PMED_START15=MED_REC_PMED_START15CHAR));
	MED_REC_PMED_START15= input(MED_REC_PMED_START15CHAR, 10.);
	drop MED_REC_PMED_START15CHAR;
run;

data bndados;
	set bndados
	(rename=(MED_REC_PMEDS_REASON15=MED_REC_PMEDS_REASON15CHAR));
	MED_REC_PMEDS_REASON15= input(MED_REC_PMEDS_REASON15CHAR, 10.);
	drop MED_REC_PMEDS_REASON15CHAR;
run;

However, what happens if I have MANY variables to convert? This seems like a very inefficient way of converting variables. I tried combining the conversion into one data step but it does not seem to want to work in SAS. So for example, I tried doing it this way:

 

 

data bndados;
	set bndados
(rename=(MED_REC_PMED_START15=MED_REC_PMED_START15CHAR)); MED_REC_PMED_START15= input(MED_REC_PMED_START15CHAR, 10.); drop MED_REC_PMED_START15CHAR; (MED_REC_PMEDS_REASON15=MED_REC_PMEDS_REASON15CHAR)); MED_REC_PMEDS_REASON15= input(MED_REC_PMEDS_REASON15CHAR, 10.); drop MED_REC_PMEDS_REASON15CHAR; run;

Is there a better way to convert a number of variables in one go?

 

 

 

 

 

 

 


Accepted Solutions
Solution
‎12-01-2016 11:17 AM
Super User
Super User
Posts: 7,977

Re: Is there a better way to convert variables from character to numeric and vice versa?

Posted in reply to christinagting0

Yep, thats the way.  This bit:

   informat SUBJECT_ID best32. ;
   informat FIRST_NAME $20.;
   format SUBJECT_ID best12. ;
   format FIRST_NAME $20.;
   input
       SUBJECT_ID
       FIRST_NAME $;

Tells SAS how to read the data - informat - what to display the data as - format - and to read the data each variable name with $ if its character.

For this:

   infile 'C:\Users\t46837uhn\Documents\Med Rec\Med Rec Evaluation\Final\BN_MEDREC.csv'
   delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;  ***DO I NEED THIS LINE?***

 infile states the file, and delimiter (you can shrink to dlm=) shows what is between data elements.  The other parts are options that affect how the data is read, or how its to be processed.  Am not going to explain all the options, the SAS docs are there for that and theres a lot of papers:

http://www2.sas.com/proceedings/sugi26/p009-26.pdf

https://communities.sas.com/t5/General-SAS-Programming/Please-explain-DSD-and-DLM-differences/td-p/1...

You may need these options, you may not, depends on your data.

Am logging off now, but can pick up tomorrow if you still have problems.

View solution in original post


All Replies
Super User
Posts: 11,343

Re: Is there a better way to convert variables from character to numeric and vice versa?

Posted in reply to christinagting0

If you have many variables to convert then I submit that the problem lies earlier in the process and likely at an import stage where the data was read from an external source. You issue is a common result of reliance on Proc Import without either using the correct options or understanding that import has to guess from the content what the variable should be. There are entirely too many systems that use files that do not constrain data types (spreadsheets being the worst offenders).

 

If you can trace this back to an import then examine that step. What type of file was involved and how was it read? If proc import what options were used? Some times the only really solution to control your data is to make sure the file is in a text format such as CSV and write a program that allows you to set the variable properties.

 

I am a big believer in fixing data structure issues as early as practical.

Frequent Contributor
Posts: 88

Re: Is there a better way to convert variables from character to numeric and vice versa?

[ Edited ]

Hi Ballardw!

 

Yes, I did use a proc import step and I imported it as a CSV. I had to import 4 separate CSV files.

 

This is the code I used:

 

proc import out=bndados
	datafile='xxxx'
	dbms=csv replace;
	*Optional statements are below; 
	getnames=yes; 
	guessingrows=200;
run;

I had to use the above code 4 times for the 4 separate CSV files. I read that I should use guessingrows so that SAS can automatically interpret what "type" each variable should be. However, this didn't realyl work b/c I think for some of the variables there were no values at all for a variable and so one file might have been inported as a characted variable while in the next imported csv file it did have values and so imported them as numeric. 

 

 

Do you have any suggestions as to how I can fix this problem?

Super User
Posts: 11,343

Re: Is there a better way to convert variables from character to numeric and vice versa?

Posted in reply to christinagting0

If the files were of the same format I believe best way is to run proc import once, capture the code generated into the editor and modify setting appropriate variable types and lengths. If the files are structured the same but have greatly different actual values you may have to revisit this step multiple times to get appropriate lengths. You can then read each of the external files into a separate data set by changing the INFILE and data set names.

 

If the files are not structured the same the process can be similar but you check which variables should be the same variable name and type and set those in common in the different data sets in the data step code.

 

I suspect in the past 10 years I have used Proc Import to generate the core of at least 100,000 lines of code and had maybe two projects with very small data sets where I left the data set resulting from proc import alone. Generally I have to set at least one characteristic, usually ID values, to a different type than import guesses.

 

Yes it is work, but attention to detail like this is what analysis or programming is about.

Super User
Super User
Posts: 7,977

Re: Is there a better way to convert variables from character to numeric and vice versa?

[ Edited ]
Posted in reply to christinagting0

There is no "better" way to convert character to number other than input, same with the other way round using put.  That being said there are bettter ways of grouping your variables in this order of precednce: 

1) First, why have lots of variables - use a normalised (i.e. data going down rather than across) structure - this method would solve half at least of these types of questions.  Bad structure = bad programming to cover it.

2) Use numeric suffixed variables which are all part of a group, e.g:

result1 result2 result3

This way you can refer to these variables in a very simple method of: cat(of result: ), or apply an array very simply.

3) Array all the necessary variables, then loop over them:

data want (drop=avar1 something other other2 xyz ...);
  set have;
  array vars_in{10} avar1 something other other2 xyz ... ;
  array results{10} 8.;
  do i=1 to 10;
    results{i}=input(vars_in{i},best.);
  end;
run;

Also, two final notes - please don't code all in upcase, its really hard to read.  Also, why do you have the need to convert lots of variables - it sounds like your datasource is wrong to start with.  Perhaps you import from Excel (which is bad to start), perhaps that is what is given to you.  I would fix the problem there rather than try to work around it.

 

Edit:  Seems like you confirmed what I say above.  Simple answer, don't leave importing your important data to a guessing procedure.  Write a datastep import and specify exactly what length, format, type the data you are readin in is - which is information you know, but the guessing procedure does not.

Frequent Contributor
Posts: 88

Re: Is there a better way to convert variables from character to numeric and vice versa?

Hi RW9. How do I write a datastep import to specify what length, format and type I want for each variable? Also, how am I suppose to know at the very initial import step which variables will not match with the other variables in my other CSV files? I guess I should mention that my end goal is to merge all 4 CSV files into one giant dataset (i.e., append them). So having all variables be the same type is important right? I need step by step instructions...lol very new to SAS and don't really know what I'm doing
Super User
Super User
Posts: 7,977

Re: Is there a better way to convert variables from character to numeric and vice versa?

Posted in reply to christinagting0

I agree with @ballardw, look at the log after your proc import, copy the code from there and modify as you need.  That is all proc import does, is guess what your data looks like and then writes the datastep import.

 

As for this statement:

"Also, how am I suppose to know at the very initial import step which variables will not match with the other variables in my other CSV files?"

If you don't know what your data is I am not sure where you go from there.  Where is your data import agreement, data specification etc.?  Key to the process of the transmittal of any data is the understanding of the structure, and content of the data - this here is more important than any import code or anything else, it is the core understanding of what data is recieved and how it is structured.

You are correct, appending the data will require both the same type and the same length.  To this end again the above information is critical.  What if in CSV1 you have a varible A which contains numeric data, and in B you have non numeric data.  What happens then.  

Sorry, to sound a bit pedandantic, but 99.9% of programming is documentation and most of the rest of it is checking the documentation.

Frequent Contributor
Posts: 88

Re: Is there a better way to convert variables from character to numeric and vice versa?

[ Edited ]

Thank you @ballardw and @RW9. What you are both saying makes a lot of sense and I think will help me immensely for future code I run as well. I really appreciate you both helping me with this. I don't really have anyone I can ask for help.

 

If you don't mind can you both let me know if my next plan of action is correct? I am not sure what I am doing and have never really used an infile data step before.

 

As per your advice I am going to copy and past the infile code that is outputted in my log and do the following. I am using 2 variables as examples here:

 

 

data WORK.BNDADOS ;
   infile 'xxxx'
   delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;  ***DO I NEED THIS LINE?***
   informat SUBJECT_ID best32. ;
   informat FIRST_NAME $20.;
   format SUBJECT_ID best12. ;
   format FIRST_NAME $20.;
   input
       SUBJECT_ID
       FIRST_NAME $;
run;

 

Then, I will do this for my 3 other CSV files.

 

Is this correct? Am I on the right track?

 

And thanks @ballardw for letting me know that you usually do have to tweak your variabels when doing a proc import. I'm not sure what the norm is for everyone else so this is helpful to know tha I'm not alone.

 

And thanks @RW9, I looked at my data dictionary to know what each of my variabels should be. I think I misunderstood what you were saying!

 

what do you both think of the above?

 

 

Solution
‎12-01-2016 11:17 AM
Super User
Super User
Posts: 7,977

Re: Is there a better way to convert variables from character to numeric and vice versa?

Posted in reply to christinagting0

Yep, thats the way.  This bit:

   informat SUBJECT_ID best32. ;
   informat FIRST_NAME $20.;
   format SUBJECT_ID best12. ;
   format FIRST_NAME $20.;
   input
       SUBJECT_ID
       FIRST_NAME $;

Tells SAS how to read the data - informat - what to display the data as - format - and to read the data each variable name with $ if its character.

For this:

   infile 'C:\Users\t46837uhn\Documents\Med Rec\Med Rec Evaluation\Final\BN_MEDREC.csv'
   delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;  ***DO I NEED THIS LINE?***

 infile states the file, and delimiter (you can shrink to dlm=) shows what is between data elements.  The other parts are options that affect how the data is read, or how its to be processed.  Am not going to explain all the options, the SAS docs are there for that and theres a lot of papers:

http://www2.sas.com/proceedings/sugi26/p009-26.pdf

https://communities.sas.com/t5/General-SAS-Programming/Please-explain-DSD-and-DLM-differences/td-p/1...

You may need these options, you may not, depends on your data.

Am logging off now, but can pick up tomorrow if you still have problems.

Super User
Posts: 5,516

Re: Is there a better way to convert variables from character to numeric and vice versa?

[ Edited ]
Posted in reply to christinagting0

Have you tried to explore why SAS thought it was a good idea to make these variables character?  You could test SAS's decision with this sort of code:

 

proc freq data=have;

tables charvar;

where charvar > ' ' and input(charvar, ??10.)=.;

run;

 

That will display the frequency of all values that can't be converted to numeric.  The results might give you a way to improve the earlier processing before trying to import into SAS.

Frequent Contributor
Posts: 88

Re: Is there a better way to convert variables from character to numeric and vice versa?

Posted in reply to Astounding

@Astounding thanks! that is a neat trick I did not know you could do!

Super User
Posts: 11,343

Re: Is there a better way to convert variables from character to numeric and vice versa?

Posted in reply to christinagting0

Depending on your data sources you may get "special" values in columns that are supposed to numeric such as "N/A" or "NA" (common for not applicable) or "NULL". These are what will tell Proc Import to use character input. Forcing a numeric informat will result in missing values for those records (approriate, hard to do arithmetic) but can lead to many messages about invalid data in variablex results.

 

SAS has options to suppress those warning messages by you can miss things. Better are custom formats that specifically set the expected values to missing and the remainder to numerics. Then if you get a value like "FRED" when "NA" is expected you know that you have another issue to investigate. Topic for separate discussion.

 

SOMETIMES the problem is that there are line feeds or carriage returns in text fields (notes fields in spreadsheets for instance). That may require file editing as these are not really patterns and can be extremely difficult to program around.

Valued Guide
Posts: 505

Re: Is there a better way to convert variables from character to numeric and vice versa?

Not sure this answers your question

SAS Forum: Is there a better way to convert variables from character to numeric and vice versa?

/* T0100690 Converting Numeric <-> Character  using the same variable names

inspired by
https://goo.gl/xdtMcU
https://communities.sas.com/t5/General-SAS-Programming/Is-there-a-better-way-to-convert-variables-from-character-to/m-p/315546

HAVE DATASET CLASSCHR
=====================

 -- CHARACTER --
AGEDAD           C  3   33
AGEMOM           C  3   32
AGEKID           C  3   13


 -- NUMERIC --
AGE              N  8   13

WANT ALL AGES NUMERIC
=====================

 -- NUMERIC --
AGEDAD          N   8   33
AGEMOM          N   8   32
AGEKID          N   8   13
AGE             N   8   13

WORKING CODE
============

proc sql
   input( AGEDAD ,best32.) as AGEDAD


FULL SOLUTION
=============

* create some data;
data classchr;
   set sashelp.class(keep=age:);
   AgeDad = put(age+20,3.);
   AgeMom = put(age+19,3.);
   AgeKid = put(age   ,3.);
run;quit;

* do the conversion char to num;
data _null_;
   set classchr(obs=1);  * put this first;
   array chrs _character_ ;
   length namchr $32756;
   do i=1 to dim(chrs);
     namchr=catx(' ',namchr,',input(',vname(chrs[i]),',best32.) as',vname(chrs[i]));
   end;
   call symputx('sel',namchr);
   rc=dosubl('
     proc sql;
        create
           table classnum (drop=nocomma) as
        select
           "nocomma"  as nocomma
           &sel
          ,*
        from
           classchr
        ;quit;
    ');
    stop;
run;quit;


* you do get these warnings but sql uses the first 'numeric' data

WARNING: Variable AGEDAD already exists on file WORK.CLASSNUM.
WARNING: Variable AGEMOM already exists on file WORK.CLASSNUM.
WARNING: Variable AGEKID already exists on file WORK.CLASSNUM.

* easy to do the reverse use put instead of input

GENERATED CODE
===============

829      proc sql;
830          create
831             table classnum (drop=nocomma) as
832          select
833             "nocomma"  as nocomma
834             &sel
               ,input( AGEDAD ,best32.) as AGEDAD
               ,input( AGEMOM ,best32.) as AGEMOM
               ,input( AGEKID ,best32.) as AGEKID
835            ,*
836          from
837             classchr
838          ;

Super Contributor
Posts: 253

Re: Is there a better way to convert variables from character to numeric and vice versa?

Posted in reply to rogerjdeangelis

While I agree with the others that you should solve the input problem, there is (in my mind) a somewhat better way to do this.

 

Something like this: 

%macro c_to_n(var=);
  &var._n = input(&var.,best32.);
  rename &var._n = &var.;
  drop &var.;
%mend c_to_n;

 

Then you can just insert calls to it.

 

data want;
  set have;
  %c_to_n(var=charvar);
run;

And it will be renamed, you don't have to worry about data step options etc.  The only risk here is that you must be cognizant of which variable is which if you're doing anything to/with it in that same datastep (so &var. is the char and &var._n is numeric, and only use &var. before the call and &var._n after the call).

 

You can of course write a similar n_to_c though you'll want to pass a length argument there.  And it's relatively easy to create these calls from your data (you can either identify the variables from some sort of data dictionary, or from a PROC FREQ that shows they are all numeric, for example).

Super User
Posts: 11,343

Re: Is there a better way to convert variables from character to numeric and vice versa?

Posted in reply to snoopy369

snoopy369 wrote:

While I agree with the others that you should solve the input problem, there is (in my mind) a somewhat better way to do this.

 

Something like this: 

%macro c_to_n(var=);
  &var._n = input(&var.,best32.);
  rename &var._n = &var.;
  drop &var.;
%mend c_to_n;

 

 


If this a continuation of early posts from the OP your solution, while reasonable in many cases ( I know I seldom create variables with names more than 12 or so characters), isn't currently feasible has the original proc import created many variables already at maximum length for variable names. So the appending _n to the existing will not be available. Which is something else to fix on reading - shorter meaningful names.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 16 replies
  • 527 views
  • 10 likes
  • 7 in conversation