BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jerry898969
Pyrite | Level 9

I have a text file that is comma delimited and it has double quotes around the character fields.  I have one header row and almost 1500 data rows.

 

When I run this code it returns columns that are supposed to be numeric as characer.  How can I fix this?

 

proc import
    datafile="c:\temp\DBData.txt"
    dbms=dlm
    out=tempData
    replace ;
    delimiter=","
    ;
    getnames=yes;
    guessingrows=max ;    
run ;

 

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions
jerry898969
Pyrite | Level 9

Hi ballardw,

 

Thank you for all your suggestions.  I was able to convince my lead to take the file to excel and create an xlsx file first before importing.  That fixed it.  There is more work that I will have to do but at least I'm variables I checked are getting the right types.

 

Thanks again for your help

View solution in original post

8 REPLIES 8
ballardw
Super User

Are any of the "numeric" fields coming with quotes? Dollar other currency signs? Commas? Parantheses?

If quotes that would be why as quoted values are assumed to be text.

Proc Import does not do a great job of guessing that values with other characters such as $10,   123,456 or (23456.78) should be treated as numeric. If those are problems then you need a data step to read the file and assign the proper format. The log should contain code generated by proc import that you can copy from the log and paste into the editor and assign proper informats.

 

jerry898969
Pyrite | Level 9

Hi ballardw,

 

Thank you for your reply.  I have 11,664 variables in this text file that I'm reading in.  The 3rd column is a year column so all the data

within it is a 4 digit numeric value.  There are no special characters within that field.   Could the first line being the variable names and text cause it to default to character? 

 

jerry898969
Pyrite | Level 9

Hi ballardw,

 

I tried the datarow=2, but that didn't fix it either. 

I also noticed that the number of variables is incorrect when I do proc import.  Does proc import automatically create valid sas names if the variable name is not?

 

My variable names have "[" "]" as part of the names.  I'm losing about 4,000 variables.

 

Thank you

ballardw
Super User

From the online documentation

The maximum LRECL width that the IMPORT procedure supports is 32767.

I suspect that since you have thousands of variables that your column headings and number of characters per row are exceeding 32,767 characters.

 

So you'll need to write datastep code to read the data. Code generated in the log for what you have read so far will show you a start.

You will need to specify an large enough length for the longest value of any character variables; date, time or datetime variables will work much better if read with a matching informat. Since you will be specifying informats anything note character or date/time/datetime will default to numeric if not specified.

 

Proc import will generate a unique variable name for each variable it imports though if the original column headings are identical for more than 32 charaters you may get VARXX where XX is the number of the variable. Non- alphanumeric or _ will be replaced with _. Identical column headings will also get a numeric suffix to differentiate them.

 

If you have a document that describes the incoming data then I would use that while writing the the datastep.

 

HINT: Assigning labels is the way SAS allows you to assign more information about the variable then the 32 characters for the variable name.

jerry898969
Pyrite | Level 9

Hi ballardw,

 

Thank you for all your suggestions.  I was able to convince my lead to take the file to excel and create an xlsx file first before importing.  That fixed it.  There is more work that I will have to do but at least I'm variables I checked are getting the right types.

 

Thanks again for your help

Tom
Super User Tom
Super User

If you don't know what is in the file then the odds are pretty good that reading it into Excel will destroy some of the information.  It will lose leading zeros from id values.  Interpret digits with hypens as dates. etc.

 

Tom
Super User Tom
Super User
@jerry898969 wrote:

Hi ballardw,

 

Thank you for your reply.  I have 11,664 variables in this text file that I'm reading in.  The 3rd column is a year column so all the data

within it is a 4 digit numeric value.  There are no special characters within that field.   Could the first line being the variable names and text cause it to default to character? 

 


11,664 variables is way too many for almost anything.  The values would have to all be 1 digit to fit within the default 32K record length for reading text files?  Have you not been given any information on what is in this file?  No data dictionary or other document that you could use to generate the code to read the file?  If you really have nothing then I would recommend reading it into a tall skinny table and analize it yourself to determine which columns are numeric and which are character.

 

data tall ;
   infile 'myfile.csv' truncover dsd firstobs=2 lrecl=1000000 length=ll column=cc ;
   row+1;
   do col=1 by 1 while (cc < ll);
     length str $200 value 8 ;
     input str @;
     value = input(str, ??comma32.);
     output;
   end;
run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 2601 views
  • 0 likes
  • 3 in conversation