Hi there,
I have a dataset and few columns have numeric values but when I reading that excel using the proc import SAS is reading them as character. Is there a way if I can retain the type while proc importing?
for eg : sd_id has values as 3,4,5,6,7,8 similarly I have other few columns causing the same issue but sas is reading it as char. so currently we can see the specific columns showing this behavior but when it goes to production environment we are not even sure if other numeric columns are start getting read as char. Is there a way we can handle this issue.
Regards
Kajal
@kajal_30 wrote:
one more concern while creating the csv file from dataset I can see there are 2 empty lines after every record. but actual excel file doesn't have any empty lines.
- importing xlsx as dataset
- exporting dataset into a csv file
May I know why am I seeing these empty lines.
regards
Kajal
If the XLSX file has end of line characters in it then that will make the CSV file unusable. Remove them.
data _null_:
set have;
file csv dsd ;
array _character_ _character_ ;
do over _character_;
_character_=translate(_character_,' ','0D0A'x);
end;
put (_all_) (+0);
run;
@kajal_30 wrote:
Hi there,
I have a dataset and few columns have numeric values but when I reading that excel using the proc import SAS is reading them as character. Is there a way if I can retain the type while proc importing?
for eg : sd_id has values as 3,4,5,6,7,8 similarly I have other few columns causing the same issue but sas is reading it as char. so currently we can see the specific columns showing this behavior but when it goes to production environment we are not even sure if other numeric columns are start getting read as char. Is there a way we can handle this issue.
Regards
Kajal
Since EXCEL does not actually have the concept of a variable there is no TYPE involved at all.
Proc Import expects simple data layouts such as at maximum one row of column headings that might be treated as variable names. I see way too many Excel workbooks with multiple rows of header information. So the second (and or 3rd/4th/etc rows) are not actually numeric and the column gets treated as character because there are character values.
In a production environment you really don't want to use import. Not just because of this issue but variable lengths can change because SAS makes the decision for such separately for each file read. If you data source is sloppy you may also have issues with variable names changing as well as type.
The most reliable way to get specific properties for "columns" is to 1) save the file to a text format such as CSV and 2) use a data step to read that resulting file with a data step. One of the options in the INFILE used for reading with a data step allows you to specify which is the first line of actual data.
Very informative @ballardw ! Thanks!
I am facing the reverse issue. SAS is reading character values as numeric because of the above issue in proc import and now I get that proc import is not a good idea but reading file by data step. Thanks again.
- Dr. Abhijeet Safai
"Production" and "Excel" are mutually exclusive. Excel files are simply useless as a medium for data transfer.
Save your data as text files (e.g. csv), which you can read with a custom written data step, so you have full control and get consistent results.
There are two situations that can happen.
1) There is some non numeric cell in that column.
2) All of the cells are empty. In that case PROC IMPORT will create the variable as character of length 1 since that will take 7 bytes less storage in the SAS dataset than a numeric variable.
In either case you have choices.
1) Fix the Excel file so the column only contains numeric values and at least one of them is not empty.
2) Save the data from the Excel into a different format. Either one that enforces variable types, like a DBF file. Or a text file that you can read with a data step.
3) Post process the SAS dataset to try to convert the character values to a numeric variable.
We will have no control over the excel file as it will be directly be provided by business to the process. so no manual intervention will be allowed.
@kajal_30 wrote:
We will have no control over the excel file as it will be directly be provided by business to the process. so no manual intervention will be allowed.
You can convert the useless Excel files to csv in batch, e.g. with LibreOffice. The Excel files stay as they are, and the csv files allow consistent reading into SAS.
Thanks @Kurt_Bremser actually we only base sas available. do we have a base sas code for playing the same role as libre office?
@kajal_30 wrote:
Thanks @Kurt_Bremser actually we only base sas available. do we have a base sas code for playing the same role as libre office?
Not really. XLSX files are ZIP archives of XML files, so they would be quite complex to read. You may find utilities for batch MS Office conversion on the internet.
So if you don't want to (or can't) install the necessary software, request data in a reasonable format in the first place. I am quite appalled that a "professional" shop uses Excel files as a transport medium.
Here's a VBS routine, SAS generated/called that converts an XLSX file to CSV.
https://gist.github.com/statgeek/878e585102c14e01581f55dbe972d27e
You can make the check/fix step as complex as you need.
Here is a simple way, but it might still have trouble with somethings, like DATE fields.
Import the Excel sheet into a dataset.
Write the data from that dataset into a text file.
Read the text file into a dataset with the expected format. If you have an existing file that has the variables defined the way you want and in the same order as the columns in the XLSX file it is very simple.
proc import datafile='myfile.xlsx' dbms=xlsx out=step1 replace;
run;
filename csv temp;
data _null_;
set step1;
file csv dsd ;
put (_all_) (+0);
run;
data want;
if 0 then set template ;
infile csv dsd truncover;
input (_all_) (+0);
run;
Hi Tom,
AS I am reimporting the csv to a dataset I can see 0 variables . Is there a way we can retain column names while importing from a csv file ? refraining from using input statement as in future business might change the sequence of the columns in the sheet
Regards
Kajal
If we're talking about production, a change in file structure must be documented before it is implemented, so your codes can be adapted. So the INPUT statement is a non-issue.
ok I can take it as a fair statement. So while we are doing the import from csv file the imported data without columns will follow the same sequence as we have in the file or it can be different.
for eg in csv file we have
cust_no region
100 USA
so for imported dataset can I put
input cust_no region should be ok ?
regards
Jagdeep
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.