DATA Step, Macro, Functions and more

proc import creating numeric variables as character

Accepted Solution Solved
Reply
Super Contributor
Posts: 398
Accepted Solution

proc import creating numeric variables as character

I have a text file that is comma delimited and it has double quotes around the character fields.  I have one header row and almost 1500 data rows.

 

When I run this code it returns columns that are supposed to be numeric as characer.  How can I fix this?

 

proc import
    datafile="c:\temp\DBData.txt"
    dbms=dlm
    out=tempData
    replace ;
    delimiter=","
    ;
    getnames=yes;
    guessingrows=max ;    
run ;

 

Thank you


Accepted Solutions
Solution
‎09-25-2015 06:23 AM
Super Contributor
Posts: 398

Re: proc import creating numeric variables as character

Hi ballardw,

 

Thank you for all your suggestions.  I was able to convince my lead to take the file to excel and create an xlsx file first before importing.  That fixed it.  There is more work that I will have to do but at least I'm variables I checked are getting the right types.

 

Thanks again for your help

View solution in original post


All Replies
Super User
Posts: 10,466

Re: proc import creating numeric variables as character

Are any of the "numeric" fields coming with quotes? Dollar other currency signs? Commas? Parantheses?

If quotes that would be why as quoted values are assumed to be text.

Proc Import does not do a great job of guessing that values with other characters such as $10,   123,456 or (23456.78) should be treated as numeric. If those are problems then you need a data step to read the file and assign the proper format. The log should contain code generated by proc import that you can copy from the log and paste into the editor and assign proper informats.

 

Super Contributor
Posts: 398

Re: proc import creating numeric variables as character

Hi ballardw,

 

Thank you for your reply.  I have 11,664 variables in this text file that I'm reading in.  The 3rd column is a year column so all the data

within it is a 4 digit numeric value.  There are no special characters within that field.   Could the first line being the variable names and text cause it to default to character? 

 

Super User
Posts: 10,466

Re: proc import creating numeric variables as character

try adding
Datarow=2;
Super Contributor
Posts: 398

Re: proc import creating numeric variables as character

Hi ballardw,

 

I tried the datarow=2, but that didn't fix it either. 

I also noticed that the number of variables is incorrect when I do proc import.  Does proc import automatically create valid sas names if the variable name is not?

 

My variable names have "[" "]" as part of the names.  I'm losing about 4,000 variables.

 

Thank you

Super User
Posts: 10,466

Re: proc import creating numeric variables as character

From the online documentation

The maximum LRECL width that the IMPORT procedure supports is 32767.

I suspect that since you have thousands of variables that your column headings and number of characters per row are exceeding 32,767 characters.

 

So you'll need to write datastep code to read the data. Code generated in the log for what you have read so far will show you a start.

You will need to specify an large enough length for the longest value of any character variables; date, time or datetime variables will work much better if read with a matching informat. Since you will be specifying informats anything note character or date/time/datetime will default to numeric if not specified.

 

Proc import will generate a unique variable name for each variable it imports though if the original column headings are identical for more than 32 charaters you may get VARXX where XX is the number of the variable. Non- alphanumeric or _ will be replaced with _. Identical column headings will also get a numeric suffix to differentiate them.

 

If you have a document that describes the incoming data then I would use that while writing the the datastep.

 

HINT: Assigning labels is the way SAS allows you to assign more information about the variable then the 32 characters for the variable name.

Solution
‎09-25-2015 06:23 AM
Super Contributor
Posts: 398

Re: proc import creating numeric variables as character

Hi ballardw,

 

Thank you for all your suggestions.  I was able to convince my lead to take the file to excel and create an xlsx file first before importing.  That fixed it.  There is more work that I will have to do but at least I'm variables I checked are getting the right types.

 

Thanks again for your help

Super User
Super User
Posts: 6,497

Re: proc import creating numeric variables as character

If you don't know what is in the file then the odds are pretty good that reading it into Excel will destroy some of the information.  It will lose leading zeros from id values.  Interpret digits with hypens as dates. etc.

 

Super User
Super User
Posts: 6,497

Re: proc import creating numeric variables as character

jerry898969 wrote:

Hi ballardw,

 

Thank you for your reply.  I have 11,664 variables in this text file that I'm reading in.  The 3rd column is a year column so all the data

within it is a 4 digit numeric value.  There are no special characters within that field.   Could the first line being the variable names and text cause it to default to character? 

 


11,664 variables is way too many for almost anything.  The values would have to all be 1 digit to fit within the default 32K record length for reading text files?  Have you not been given any information on what is in this file?  No data dictionary or other document that you could use to generate the code to read the file?  If you really have nothing then I would recommend reading it into a tall skinny table and analize it yourself to determine which columns are numeric and which are character.

 

data tall ;
   infile 'myfile.csv' truncover dsd firstobs=2 lrecl=1000000 length=ll column=cc ;
   row+1;
   do col=1 by 1 while (cc < ll);
     length str $200 value 8 ;
     input str @;
     value = input(str, ??comma32.);
     output;
   end;
run;
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 351 views
  • 0 likes
  • 3 in conversation