BookmarkSubscribeRSS Feed
emcclung2442
Calcite | Level 5

Hi all. 

 

I'm currently using excel for data input into SAS for analysis and cleaning. It's beginning to give me issues when it comes to merging datasets. Are there any programs that I could use that would allow me to input an excel spreadsheet into the programt that could then be input into SAS as a dataset? I currently use SAS 9.4, and am using PROC IMPORT to import Excel spreadsheets.

5 REPLIES 5
Reeza
Super User

Use a text file (CSV) with manual control.  

 

You have full control over the type and length when reading in the data.

emcclung2442
Calcite | Level 5

I've tried that, but I've come across the issue of variables being defined as both numeric and character. 

ballardw
Super User

You don't say how you are getting the data into SAS. I am willing to make a small wager that you are using Proc Import which is almost a gaurateed failure over time. Save the Excel data to CSV. Then write a data step to read the CSV. Proc Import 'guesses' as to the field size and type every time it is run. So variable lengths change and change type depending on the content. Even if you mean 123 to be character if all of the values in a column look like that the guessing algorithm will set them as numeric.

 

Use proc import once on a CSV to get skeleton of a program with informats, formats and input statements. You can copy that from the log (or use RECALL command) to edit the result. Look at the statements and make decisions for variable type and size to match you input design (which Excel will completely ignore). Then once you have the program working all you need to do is point the infile to a new file and the output to a new output data set name.

 

The data step is also a good place to add meaningful variable labels which will generally not happen using proc import.

emcclung2442
Calcite | Level 5

Apologies, yes I am using PROC IMPORT. Would it be possible to for you to provide an example of what you're describing? I've only ever used the import procedure to read in excel files.

rogerjdeangelis
Barite | Level 11
SAS Innovation: Programatically clean excel data, determine type and length on the excel side before import

1. You were are given an excel file from headquarters and asked to import the data to SAS.
2. You are concerned about type, length and bad data
3. You would like to clean the data on the excel side and determine all types and lengths before importing

With the libname and connection engines opart of base SAS. It may be possible, and I have not tried this, to create an excel view of any database that you have an ODBC driver and use the code below to import from Oracle, MS-Access, SQL-Server... HAVE EXCEL NAMED RANGE 'CLASS' FROM HEADQUARTERS ================================================== WE HAVE NOT IMPORTED THE DATA TO SAS YET EXCEL.class total obs=19 Obs WGTMIX NAME SEX AGE HEIGHT 1 113 Alfred M 14 69.0 2 A0084 Alice F 13 56.5 3 98 Barbara F 13 65.3 4 A0103 Carol F 14 62.8 5 103 Henry M 14 63.5 6 A0083 James M 12 57.3 WHAT WE WANT 1. Determine type and length 2. Fix WGTMIX by converting A0084 to 84, A0103 to 103 ... BEFORE IMPORTING TO SAS Number of rows with character data (from excel data not imported) NumObs NumChrName NumChrSex NumChrAge NumChrHgt NumChrWgtMix ---------------------------------------------------------------------- 19 19 19 0 0 9 From this we know Name and Sex are all character Age and Height are all numeric Weight is both numeric and non-numeric (show you how to get nulls later) Max length for each variable MaxLenName MaxLenSex MaxLenAge MaxLenWgt MaxLenHgt --------------------------------------------------------------- 7 1 2 5 4 * This what we need to import cleaned data; libname xls "d:/xls/vueclass.xlsx"; data class; set xls.class(dbsastype= ( Name = 'char(7)' Sex = 'char(1)' Age = 'numeric' Weight = 'numeric' Height = 'numeric' )); run;quit; libame xls clear; SOLUTION ( CREATE THE EXCEL INPUT) ================================== libname xel "d:\xls\class.xlsx" scan_text=no; data xel.class; retainname sex age height wgtmix; length wgtmix $12; set sashelp.class; if mod(_n_,2)=0 then wgtmix=cats("A",put(weight,z4.)); else wgtmix=put(weight,4.); keep name sex age height wgtmix; run;quit; libname xel clear; /* how many columns have at least one character cell */ proc sql dquote=ansi; connect to excel (Path="d:\xls\class.xlsx" mixed=yes); select * from connection to Excel ( Select count(*) as NumObs ,count(*) + sum(isnumeric(name)) as NumChrName ,count(*) + sum(isnumeric(sex)) as NumChrSex ,count(*) + sum(isnumeric(age)) as NumChrAge ,count(*) + sum(isnumeric(height)) as NumChrHgt ,count(*) + sum(isnumeric(wgtmix)) as NumChrWgtMix from class ); disconnect from Excel; Quit; Number of observations that have at least one character variable. NumObs NumChrName NumChrSex NumChrAge NumChrWgtMix --------------------------------------------------------- 19 19 19 0 9 /* what is the max length of the columns */ proc sql dquote=ansi; connect to excel (Path="d:\xls\class.xlsx" mixed=yes); select * from connection to Excel ( Select max(len(name)) as MaxLenName ,max(len(sex)) as MaxLenSex ,max(len(age)) as MaxLenAge ,max(len(wgtmix)) as MaxLenWgtMix ,max(len(height)) as MaxLenHgt from class ); disconnect from Excel; Quit; /* MaxLenName MaxLenSex MaxLenAge MaxLenWgtMix MaxLenHgt -------------------------------------------------------------- 7 1 2 5 4 */ HERE IS THE FINAL IMPORT proc sql dquote=ansi; connect to excel (Path="c:\xls\class.xlsx" mixed=yes); create table xls_class as select ChrName length=7 ,ChrSex length=1 ,ChrWgtMix length=5 ,input(ChrAge,4.) as NumAge ,input(ChrHgt,5.1) as NumHgt ,input(ChrWgtFix,4.) as NumWgt from connection to Excel ( /* make the changes on the excel size - send this to headquarters for confirmation */ /* formats provide documentation on what should be in the cells */ /* all fields ar character here - which is what they really are originally? */ Select Name as ChrName ,Sex as ChrSex ,format(Age,'###') as ChrAge ,WgtMix as ChrWgtMix ,format(Height,'###.#') as ChrHgt /* remove the leading A, make it an integer and format to 3 digits -99 to 999 */ ,format(int(iif(mid(WgtMix,1,1)="A",mid(WgtMix,2,4),WgtMix)),'###') as ChrWgtFix from class ); disconnect from Excel; Quit; proc sql; drop table xel.class; /* only useful if rerunning */ quit;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

Discussion stats
  • 5 replies
  • 1192 views
  • 0 likes
  • 4 in conversation