Hi all.
I'm currently using excel for data input into SAS for analysis and cleaning. It's beginning to give me issues when it comes to merging datasets. Are there any programs that I could use that would allow me to input an excel spreadsheet into the programt that could then be input into SAS as a dataset? I currently use SAS 9.4, and am using PROC IMPORT to import Excel spreadsheets.
Use a text file (CSV) with manual control.
You have full control over the type and length when reading in the data.
I've tried that, but I've come across the issue of variables being defined as both numeric and character.
You don't say how you are getting the data into SAS. I am willing to make a small wager that you are using Proc Import which is almost a gaurateed failure over time. Save the Excel data to CSV. Then write a data step to read the CSV. Proc Import 'guesses' as to the field size and type every time it is run. So variable lengths change and change type depending on the content. Even if you mean 123 to be character if all of the values in a column look like that the guessing algorithm will set them as numeric.
Use proc import once on a CSV to get skeleton of a program with informats, formats and input statements. You can copy that from the log (or use RECALL command) to edit the result. Look at the statements and make decisions for variable type and size to match you input design (which Excel will completely ignore). Then once you have the program working all you need to do is point the infile to a new file and the output to a new output data set name.
The data step is also a good place to add meaningful variable labels which will generally not happen using proc import.
Apologies, yes I am using PROC IMPORT. Would it be possible to for you to provide an example of what you're describing? I've only ever used the import procedure to read in excel files.
SAS Innovation: Programatically clean excel data, determine type and length on the excel side before import
1. You were are given an excel file from headquarters and asked to import the data to SAS.
2. You are concerned about type, length and bad data
3. You would like to clean the data on the excel side and determine all types and lengths before importing
With the libname and connection engines opart of base SAS.
It may be possible, and I have not tried this, to create an excel
view of any database that you have an ODBC driver and use the code below to import from
Oracle, MS-Access, SQL-Server...
HAVE EXCEL NAMED RANGE 'CLASS' FROM HEADQUARTERS
==================================================
WE HAVE NOT IMPORTED THE DATA TO SAS YET
EXCEL.class total obs=19
Obs WGTMIX NAME SEX AGE HEIGHT
1 113 Alfred M 14 69.0
2 A0084 Alice F 13 56.5
3 98 Barbara F 13 65.3
4 A0103 Carol F 14 62.8
5 103 Henry M 14 63.5
6 A0083 James M 12 57.3
WHAT WE WANT
1. Determine type and length
2. Fix WGTMIX by converting A0084 to 84, A0103 to 103 ...
BEFORE IMPORTING TO SAS
Number of rows with character data (from excel data not imported)
NumObs NumChrName NumChrSex NumChrAge NumChrHgt NumChrWgtMix
----------------------------------------------------------------------
19 19 19 0 0 9
From this we know
Name and Sex are all character
Age and Height are all numeric
Weight is both numeric and non-numeric (show you how to get nulls later)
Max length for each variable
MaxLenName MaxLenSex MaxLenAge MaxLenWgt MaxLenHgt
---------------------------------------------------------------
7 1 2 5 4
* This what we need to import cleaned data;
libname xls "d:/xls/vueclass.xlsx";
data class;
set xls.class(dbsastype=
(
Name = 'char(7)'
Sex = 'char(1)'
Age = 'numeric'
Weight = 'numeric'
Height = 'numeric'
));
run;quit;
libame xls clear;
SOLUTION ( CREATE THE EXCEL INPUT)
==================================
libname xel "d:\xls\class.xlsx" scan_text=no;
data xel.class;
retainname sex age height wgtmix;
length wgtmix $12;
set sashelp.class;
if mod(_n_,2)=0 then wgtmix=cats("A",put(weight,z4.));
else wgtmix=put(weight,4.);
keep name sex age height wgtmix;
run;quit;
libname xel clear;
/* how many columns have at least one character cell */
proc sql dquote=ansi;
connect to excel (Path="d:\xls\class.xlsx" mixed=yes);
select * from connection to Excel
(
Select
count(*) as NumObs
,count(*) + sum(isnumeric(name)) as NumChrName
,count(*) + sum(isnumeric(sex)) as NumChrSex
,count(*) + sum(isnumeric(age)) as NumChrAge
,count(*) + sum(isnumeric(height)) as NumChrHgt
,count(*) + sum(isnumeric(wgtmix)) as NumChrWgtMix
from
class
);
disconnect from Excel;
Quit;
Number of observations that have at least one character variable.
NumObs NumChrName NumChrSex NumChrAge NumChrWgtMix
---------------------------------------------------------
19 19 19 0 9
/* what is the max length of the columns */
proc sql dquote=ansi;
connect to excel (Path="d:\xls\class.xlsx" mixed=yes);
select * from connection to Excel
(
Select
max(len(name)) as MaxLenName
,max(len(sex)) as MaxLenSex
,max(len(age)) as MaxLenAge
,max(len(wgtmix)) as MaxLenWgtMix
,max(len(height)) as MaxLenHgt
from
class
);
disconnect from Excel;
Quit;
/*
MaxLenName MaxLenSex MaxLenAge MaxLenWgtMix MaxLenHgt
--------------------------------------------------------------
7 1 2 5 4
*/
HERE IS THE FINAL IMPORT
proc sql dquote=ansi;
connect to excel (Path="c:\xls\class.xlsx" mixed=yes);
create table xls_class as
select
ChrName length=7
,ChrSex length=1
,ChrWgtMix length=5
,input(ChrAge,4.) as NumAge
,input(ChrHgt,5.1) as NumHgt
,input(ChrWgtFix,4.) as NumWgt
from connection to Excel
(
/* make the changes on the excel size - send this to headquarters for confirmation */
/* formats provide documentation on what should be in the cells */
/* all fields ar character here - which is what they really are originally? */
Select
Name as ChrName
,Sex as ChrSex
,format(Age,'###') as ChrAge
,WgtMix as ChrWgtMix
,format(Height,'###.#') as ChrHgt
/* remove the leading A, make it an integer and format to 3 digits -99 to 999 */
,format(int(iif(mid(WgtMix,1,1)="A",mid(WgtMix,2,4),WgtMix)),'###') as ChrWgtFix
from
class
);
disconnect from Excel;
Quit;
proc sql;
drop table xel.class; /* only useful if rerunning */
quit;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!