Hi,
I am attempting to import multiple CSV files (>500) into SAS and they all have the same variable formats.
There are a total of 16 variables of which 2 should be character and 14 should be numeric.
I have tried doing a proc import, however all fields were assigned to character by default.
I then moved onto the data infile method and renamed the 14 variables from character ($4.) to best.
This is causing problems as the data for 4/14 variables contains a '-' where there is no number present.
For example (one variable):
Volume |
1,096,532 |
- |
63,741 |
978,134 |
How do I go about setting the numeric format? Is there another format besides 'best.' that can be used to accomodate for the '-'?
Any help would be greatly appreciated.
SAS log shows:
NOTE: Invalid data for Volume in line 16 52-57
First thing is don't use PROC IMPORT for files that you already KNOW the format. It will make guesses at how to define each variable. And it will make that guess for each file independently. Write your own data step to read the file. You can even read all of them in one data step if you want.
To your question about how to read in a hyphen as meaning a missing value there here are some simple solutions.
1) Just let SAS change it to missing for you. You can add the ? or ?? modifiers to the INPUT (or INPUT() function) to suppress the errors.
data want ;
infile 'myfile.csv' dsd truncover firstobs=2 ;
input volume :??comma. ;
run;
2) Read it as character and program around the value.
data want ;
infile 'myfile.csv' dsd truncover firstobs=2 ;
input volume_raw :$20. ;
if volume_raw not in ('-',' ') then volume=input(volume_raw,comma32.) ;
run;
3) Make your own informat and use that.
proc format ;
invalue hyphen '-'=. other=[comma32.] ;
run;
data want ;
infile 'myfile.csv' dsd truncover firstobs=2 ;
input volume hyphen.;
run;
Numbers like that are read with the comma. informat. You will only get a NOTE for invalid data for the hyphen. If you want to avoid that, do
input _number :$20.;
if _number ne '-' then number = input(_number,comma20.);
drop _number;
(it is recommended to make your code clean so that it does not throw NOTEs as mentioned above in normal operation)
Edit: added a missing underline in the input function
Using
options dsoptions=note2err;
turns all unexpected notes into error messages.
First thing is don't use PROC IMPORT for files that you already KNOW the format. It will make guesses at how to define each variable. And it will make that guess for each file independently. Write your own data step to read the file. You can even read all of them in one data step if you want.
To your question about how to read in a hyphen as meaning a missing value there here are some simple solutions.
1) Just let SAS change it to missing for you. You can add the ? or ?? modifiers to the INPUT (or INPUT() function) to suppress the errors.
data want ;
infile 'myfile.csv' dsd truncover firstobs=2 ;
input volume :??comma. ;
run;
2) Read it as character and program around the value.
data want ;
infile 'myfile.csv' dsd truncover firstobs=2 ;
input volume_raw :$20. ;
if volume_raw not in ('-',' ') then volume=input(volume_raw,comma32.) ;
run;
3) Make your own informat and use that.
proc format ;
invalue hyphen '-'=. other=[comma32.] ;
run;
data want ;
infile 'myfile.csv' dsd truncover firstobs=2 ;
input volume hyphen.;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.