Hi all SAS Users,
First of all, thank you for all your bits of help so far. I think it is time for me to apply the CSV for my whole project. I know it is a pain but because I deal with international data, so it is better that I got pain at the early stage (I think no less than 10 times that @ballardw, @SASKiwi , and @Reeza,, and others implied me to change to CSV but I was so stubborn because I was scare of changing..). I walked through a lot of discussions in the forums (and even in my discussions), the very common sentence is "The best bet is CSV" when a question relates to an excel problem.
Let me briefly introduce my dataset:
I have 64 files (workbooks) representing 64 countries. In each file, I have 45 sheets representing 45 variables.
And we know that CSV only read one worksheet per time. And @ChrisNZ and @Kurt_Bremser also suggest to me a macro to transform all worksheets in one xlsx file to all separate CSV by using Excel macro as here. I did not check the macro and the LibreOffice program because they are still ambiguous to me (but I will learn them if you continue telling me that they are good ways to deal with CSV). So can you please explain to me how to use LibreOffice or is there any other SAS-related or a user-friendly way to create 45 CSV files from one workbook for all workbooks, is there any relative error-free way to do so?
Secondly, normally, whether I should put these CSV files into 64 folders, each folder contains 45 CSV files representing 45 variables? It is my code for dealing with 64 excel files by using importing(successfully tested).
I put my importing code with excel here for a clearer picture. What I aim for is I want to have the same result as in the code below but using CSV engine instead of Excel.
And one more further question, should I use PROC IMPORT or INFILE statement to import CSV files, please?
OPTIONS MPRINT;
%macro ImportAndTranspose(
File=
, cur=
, outf=
, StartSheet=
, EndSheet=
);
%local i;
%do i = &StartSheet. %to &EndSheet.;
%if &i=1 %then
%do;
proc import datafile= "&File."
out= &outf.&i.
dbms= xlsx
replace;
range= "Sheet1$A:X";
getnames= yes;
run;
proc sort data= &outf.&i.;
by Type;
run;
%end;
%else %if &i=34 %then
%do;
proc import datafile= "&File."
out= &outf.&i.
dbms= xlsx
replace;
range= "Sheet&i.$A:AG";
getnames= yes;
run;
proc sort data= &outf.&i.;
by Type;
run;
proc transpose data= &outf.&i.
out= &outf.&i._out(rename=(COL1=s&i. _NAME_=Year) drop=_label_);
by Type;
VAR '1988'N - '2019'N;
run;
data &outf.&i._outx;
set &outf.&i._out;
if s&i. in: ('NA', '$$', '') then s&i. = " ";/********/
run;
%end;
%else %if (&i ne 1) and (&i ne 34) %then
%do;
proc import datafile= "&File."
out= &outf.&i.
dbms= xlsx
replace;
range= "Sheet&i.$A:AG";
getnames= yes;
run;
proc sort data= &outf.&i.;
by Type;
run;
proc transpose data= &outf.&i.
out= &outf.&i._out(rename=(COL1=s&i. _NAME_=Year) drop=_label_);
by Type;
VAR '1988'N - '2019'N;
run;
data &outf.&i._outx;
set &outf.&i._out;
if input(s&i., ?? 32.) = . then s&i.2=.;/*added line 21_Jan*/
else s&i.2=input(s&i., 32.);/*added line 21_Jan*/
/*if s&i. in: ('NA', '$$', '..') then s&i.2=.;
else if s&i. not in: ('NA', '$$', '..') then s&i.2=input(s&i., 32.);*/
drop s&i.;
rename s&i.2=s&i.;
run;
%end;
%end;
/* Merging*/
%let outfm = %substr(&outf,1,%length(&outf)-5);
data &outfm.merge1;
merge
%do i =&StartSheet.+1 %to &EndSheet.;
&outf.&i._outx(keep= type year s&i.)
%end;
;
by type year;
run;
data &outfm.merge2 (drop=NAME);
merge
&outf.&StartSheet.
&outfm.merge1
;
by type;
run;
*********************************************;
/* Delete file results to reduce operating*/
*************;
%local i;
%do i = &StartSheet.+1 %to &EndSheet.;
*+1 because sheet 1 behave differently*;
proc delete data=&outf.&i.;
proc delete data=&outf.&i._out;
proc delete data=&outf.&i._outx;
%end;
*****************;
proc delete data=&outf.1;
proc delete data=&outfm.merge1;
**********************************************;
************************************************;
**EXPORT DATASET**;
/*https://communities.sas.com/t5/SAS-Programming/making-a-sas7bdat-file/td-p/417487*/
libname myfolder 'C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\2Cfiltering';
proc copy in=work out=myfolder;
select &outfm.merge2;
run;
quit;
/*proc delete data=&outfm.filter;*/
%mend;
/*Replicate all files in one folder*/
filename mydir 'C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\leniency & operating\dataxlsx';
data _null_;
did = dopen('mydir');
do i = 1 to dnum(did);
fname = scan(dread(did,i),1,'.');
/*fname=ARGENTINAARS for a particular country where last three characters(ARS) are currency and before that(ARGENTINA) is country name*/
length short_fn $29 currency $3;
short_fn= cats(substr(fname, 1,length(fname)-3),'_');
currency=substr(fname,length(fname)-2);
cmd=cats('%ImportAndTranspose(File=C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\leniency & operating\dataxlsx\',
strip(fname),
',cur=',currency,
',outf=',short_fn,'sheet,startsheet=1,endsheet=45);');
call execute(cmd);
end;
keep fname;
run;
I believe that, with your help, I can be more involved and professional in working with SAS.
Warmest regards,
Many thanks in advance,
Phil.
Hi @ChrisNZ
Mainly it is about the type and length of the numeric variables. It is very dangerous. I scare some of my numeric observation will be truncated due to the fact that EXCEL does not allow me to use GUESSINGROW=MAX to go through all the observations to have a final attribute for a variable.
For example, total assets of a company in the 30th row is 10 digit, says $10bil. but if GUESSINGROW in EXCEL is automatically is 20, and the highest value for the first 20 observations is $999mil (9 digits). So, it makes sense if the 30th observation would be truncated. Even @Patrick test with 1 million obs and seems GUESSINGROW works til the 1 millionth ob
Practically speaking, I also test data of US and UK, and I saw that the maximum of asset of a company in the excel file is similar to that in the SAS dataset using import. But because there is no official document that confirms that SAS will go through all observation to get the final attributes. Therefore, the accuracy cannot be confirmed...
Warmest regards,
Phil.
Hi @Patrick
That's great, but I am still trying to find the whitepaper mentions that xlsx engine also scans all the rows in each column to determine the data length as well.
Can I ask your keyword that I can do a search, thank you?
Warm regards.
Click on "here" in my previous post. That gets you to the mentioned paper.
Doesn't option DBSASTYPE help?
The way this is going, you could read all columns as strings, and then transform then.
This way you control the reading process.
There is a dedicated tool for XLSX to CSV conversion called xlsx2csv, look into that; the LibreOffice convert from the commandline will always pick up the first sheet only.
A google search for "convert xlsx to csv with libreoffice in batch" might reveal further options.
Once you have a directory filled with CSV's, a single data step will read them all with consistent attributes.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.