Hi,
I am trying to do proc import from Excel workbook to load some empty fields which will be later added with integer values. The dataset is guessing rows in a way that string datatype is assigned to these variables. Please help.
Thanks
Anjali
%macro abc;
proc options option=validvarname;
run;
proc import replace datafile= "/SASAnalytics/DA/Daily_File_20200516.xlsx"
out=DAShare2.DSN_TEMP
DBMS=XLSX;
run;
proc export data=NEW dbms=csv
outfile="temp.csv"
replace;
run;
%let varlist= cookie $28 date $10 timestamp $28 Last_url_visited $800 ip $28 stage $28 device $28 os $28 browser $28 Page $28 Product $28
SubProduct $120 Pname $8 cookieBirth $8 typnmarr $28 dslv 5 last_10_urls_visited $400 taxonomy $400 dsdo 8 clickid $28 DSLV_cards 8
DSLV_accounts 8 DSLV_loans 8 DSLV_personal_loan_expats 8 DSLV_bca_silver 8 DSLV_betaqti_credit_card 8 DSLV_etihad_GA_account 8
DSLV_credit_cards 8 DSLV_touchpoint_ic 8 DSLV_privilege_C_Plat_DC 8 DSLV_millionaire_D_Saving_Acc 8 DSLV_traveller_credit_card 8
DSLV_adcb_titanium_debit_card 8 DSLV_touchpoint_platinum_cc 8 DSLV_fixed_deposit_account 8 DSLV_active_saver_account 8 DSLV_excellence_plat_DC 8
DSLV_etihad_guest_Abo_Exce 8 DSLV_personal_loan_uaeNationl 8 DSLV_etihad_guest_Infin_CC 8 DSLV_personal_finance_for_expats 8
DSLV_personal_finance_UAENation 8 DSLV_etihad_guest_privilege_acc 8 DSLV_etihad_guest_excellency_acc 8 DSLV_etihad_platinum_credit_card 8
DSLV_etihad_guest_accounts 8;
data dsn;
informat <put your varlist here> ;
infile "test.csv" <options> ;
input <put your varlist here> ;
set DAShare2.DSN_temp;
run;
%mend;
%abc;
Save the required sheet(s) to csv file(s), and read those with a data step where you have full control over the import process.
@Kurt_Bremser There are around 200 xlsx , I wont be able to manually convert them to csv.
@SASUserRocks wrote:
@Kurt_Bremser There are around 200 xlsx , I wont be able to manually convert them to csv.
Using the worst imaginable file format for mass data transfer is at best stupid. No matter what you do, you end up with manual conversions and/or fixes, because every single import will end up with more or less subtle differences.
You might be able to automate a xlsx to csv conversion with LibreOffice or VBA.
Have the data sent in a sensible format in the first place.
Excel is only good for manual tinkering, not for automatable (and reliably automatable) processes.
Please try to use the mixed=yes option to avoid the issue
PROC IMPORT OUT= mydata
DATAFILE= "J:\users\nsouli\Debugging.xls"
DBMS=EXCEL REPLACE;
SHEET="'SAS$'";
GETNAMES=YES;
MIXED=yes;
RUN;
@Jagadishkatam Tried below code and its throwing error "ERROR: DBMS type EXCEL not valid for import."
proc import replace datafile= "/SASAnalytics/DA/Daily_File_20200516.xlsx"
out= DAShare2.Test
dbms=excel ;
DBDSOPTS= "DBTYPE=(DSLV_personal_finance_UAENation='NUM(8)')";
MIXED=yes;
run;
You can try to convert your excel file in csv / txt. Then you can use a DATA step to import your file and specify the type of your variables with the correct informat in an INPUT statement.
Best,
There are huge number of xlsx files that I will receive everyday and not willing to create overhead at SAS server.
Anjali,
You keep mentioning large amount of excel files is it 100s, 1000s. I hope you do not process them manually. Unfortunately SAS has minimal control on reading data from excel as opposed to other formats. If you know what the fields are and what the formats should be you can do explicit conversion after reading it into a sas dataset.
Hi @smantha , Currently its 200, but this is daily incremental file that keeps coming till future date. I am trying below code to work.
options validvarname=any;
%macro abc;
proc import replace datafile= "/SASAnalytics/DA/Weblog_Lemnisk/ProdData/NewSet/Daily_File_20200516.xlsx"
out=DAShare2.DSN_TEMP
DBMS=XLSX;
%let varlist= cookie $20 date $10 timestamp $20 Last_url_visited $800 ip $20 stage $20 device $20 os $20 browser $20 Page $120 Product $120
SubProduct $120 Pname $120 cookieBirth $20 typnmarr $120 dslv 5 last_10_urls_visited $400 taxonomy $400 dsdo 20 clickid $120 DSLV_cards 20
DSLV_accounts 20 DSLV_loans 20 DSLV_personal_loan_expats 20 DSLV_bca_silver 20 DSLV_betaqti_credit_card 20 DSLV_etihad_GA_account 20
DSLV_credit_cards 20 DSLV_touchpoint_ic 20 DSLV_privilege_C_Plat_DC 20 DSLV_millionaire_D_Saving_Acc 20 DSLV_traveller_credit_card 20
DSLV_adcb_titanium_debit_card 20 DSLV_touchpoint_platinum_cc 20 DSLV_fixed_deposit_account 20 DSLV_active_saver_account 20 DSLV_excellence_plat_DC 20
DSLV_etihad_guest_Abo_Exce 20 DSLV_personal_loan_uaeNationl 20 DSLV_etihad_guest_Infin_CC 20 DSLV_personal_finance_for_expats 20
DSLV_personal_finance_UAENation 20 DSLV_etihad_guest_privilege_acc 20 DSLV_etihad_guest_excellency_acc 20 DSLV_etihad_platinum_credit_card 20
DSLV_etihad_guest_accounts 20;
data dsn;
length &varlist ;
set DAShare2.DSN_temp;
format _all_;
run;
%mend;
%abc;
First you can check if the formats of variable from the excel read is consistent with what you expect. If not then write out explicit conversions. 200 files is not that many, being xlsx i can guess at most a million rows, unless the number of sheets you read is more than 5. Can you request whoever is generating these files to write as csv?
20 is not a valid length for numeric variables (maximum is 8). Your code will also not work if a column was imported with the wrong type (numeric vs. character).
I can only repeat: from 20+ years of SAS experience, using Excel files for this is madness.
Accept that fact. We will get files in CSV format in future , until then I have to process xlsx files for the dashboard.
Close. Try this instead. Use PROC IMPORT to read it in. Use data step to write it out to temporary CSV file. Then use another data step to read in the data from the CSV file.
The VALIDVARNAME=ANY doesn't really do you any good, none of the names you want to use require it. The length for numeric variables should be 8 since SAS stores them as 8 byte floating point numbers. If you have any actual date, time or datetime values then you will want to add a FORMAT statement to the data step that creates the final dataset from the temporary csv file. But you don't need to write formatted values to the temporary CSV file, so no need for any INFORMAT statements.
%macro abc(infile,dsn);
* Import sheet ;
proc import dbms=xlsx out=_temp_ replace
datafile="/SASAnalytics/DA/Weblog_Lemnisk/ProdData/NewSet/&infile..xlsx"
;
run;
* Write to CSV file ;
filename csv temp;
data _null_;
file csv dsd ;
set _temp_;
put (_all_) (+0);
format _numeric_ best32. _character_;
run;
* Read from CSV file ;
data &dsn;
length
cookie $20
date $10
timestamp $20
Last_url_visited $800
ip $20
stage $20
device $20
os $20
browser $20
Page $120
Product $120
SubProduct $120
Pname $120
cookieBirth $20
typnmarr $120
dslv 8
last_10_urls_visited $400
taxonomy $400
dsdo 8
clickid $120
DSLV_cards 8
DSLV_accounts 8
DSLV_loans 8
DSLV_personal_loan_expats 8
DSLV_bca_silver 8
DSLV_betaqti_credit_card 8
DSLV_etihad_GA_account 8
DSLV_credit_cards 8
DSLV_touchpoint_ic 8
DSLV_privilege_C_Plat_DC 8
DSLV_millionaire_D_Saving_Acc 8
DSLV_traveller_credit_card 8
DSLV_adcb_titanium_debit_card 8
DSLV_touchpoint_platinum_cc 8
DSLV_fixed_deposit_account 8
DSLV_active_saver_account 8
DSLV_excellence_plat_DC 8
DSLV_etihad_guest_Abo_Exce 8
DSLV_personal_loan_uaeNationl 8
DSLV_etihad_guest_Infin_CC 8
DSLV_personal_finance_for_expats 8
DSLV_personal_finance_UAENation 8
DSLV_etihad_guest_privilege_acc 8
DSLV_etihad_guest_excellency_acc 8
DSLV_etihad_platinum_credit_card 8
DSLV_etihad_guest_accounts 8
;
infile csv dsd truncover ;
input (_all_) (+0);
run;
%mend;
*options validvarname=any;
%abc(Daily_File_20200516,dsn1)
%abc(Daily_File_20200517,dsn2)
How similar in structure are these files? This means column order with same intended values/types?
How many header rows (a major bane of Proc Import)? Does this number change in files of the same "structure".
If the files have a similar structure you might be able to use either LIBNAME PCFILES or EXCEL to access the data. Then you would have meta data information available available from the SASHELP.Vcolumns or Dictionary.Columns when using Proc SQL. Then write/ modify code to create new SAS datasets as needed, variable names, types, labels and such using data step code. But if you have lots of different layouts this can get cumbersome because you will have to have code that examines pretty much each column of the spreadsheet with the desired type and implement the correct code for conversion.
The details are important though. If the column orders or numbers of header rows change then you can get into very complicated headaches in attempting to automate. I have one source of XLSX I have to deal with where the 1) column orders change from month to month, 2) the number of header rows can change from month to month and 3) the "values" of a file or personal identification value will change to DATE formats partway through the file. And I have no way I can think of that would "automate" that conversion with out testing every single value in every single column.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.