BookmarkSubscribeRSS Feed
Phil_NZ
Barite | Level 11

Hi all SAS Users,

 

First of all, thank you for all your bits of help so far. I think it is time for me to apply the CSV for my whole project. I know it is a pain but because I deal with international data, so it is better that I  got pain at the early stage (I think no less than 10 times that @ballardw, @SASKiwi , and @Reeza,, and others implied me to change to CSV but I was so stubborn because I was scare of changing..). I walked through a lot of discussions in the forums (and even in my discussions), the very common sentence is "The best bet is CSV" when a question relates to an excel problem.

 

Let me briefly introduce my dataset:

I have 64 files (workbooks) representing 64 countries. In each file, I have 45 sheets representing 45 variables. 

And we know that CSV only read one worksheet per time. And @ChrisNZ and @Kurt_Bremser also suggest to me a macro to transform all worksheets in one xlsx file to all separate CSV by using Excel macro as here. I did not check the macro and the LibreOffice program because they are still ambiguous to me (but I will learn them if you continue telling me that they are good ways to deal with CSV). So can you please explain to me how to use LibreOffice or is there any other SAS-related or a user-friendly way to create 45 CSV files from one workbook for all workbooks, is there any relative error-free way to do so?

 

Secondly, normally, whether I should put these CSV files into 64 folders, each folder contains 45 CSV files representing 45 variables? It is my code for dealing with 64 excel files by using importing(successfully tested).

I put my importing code with excel here for a clearer picture. What I aim for is I want to have the same result as in the code below but using CSV engine instead of Excel.

 

And one more further question, should I use PROC IMPORT or INFILE statement to import CSV files, please?

 

OPTIONS MPRINT;

%macro ImportAndTranspose(
			File=
			, cur=
			, outf=
			, StartSheet=
			, EndSheet=
			);
	%local i;

	%do i = &StartSheet. %to &EndSheet.;
		%if &i=1 %then
			%do;

				proc import datafile= "&File." 
					out= &outf.&i.
					dbms= xlsx 
					replace;
					range= "Sheet1$A:X";
					getnames= yes;
				run;

				proc sort data= &outf.&i.;
					by Type;
				run;

			%end;
		%else %if &i=34 %then
			%do;

				proc import datafile= "&File." 
					out= &outf.&i.
					dbms= xlsx 
					replace;
					range= "Sheet&i.$A:AG";
					getnames= yes;
				run;

				proc sort data= &outf.&i.;
					by Type;
				run;

				proc transpose data= &outf.&i. 
					out= &outf.&i._out(rename=(COL1=s&i. _NAME_=Year) drop=_label_);
					by Type;
					VAR '1988'N - '2019'N;
				run;

				data &outf.&i._outx;
					set &outf.&i._out;

					if s&i. in: ('NA', '$$', '') then s&i. = " ";/********/
				run;

			%end;
		%else %if (&i ne 1) and (&i ne 34) %then
			%do;

				proc import datafile= "&File." 
					out= &outf.&i.
					dbms= xlsx 
					replace;
					range= "Sheet&i.$A:AG";
					getnames= yes;
				run;

				proc sort data= &outf.&i.;
					by Type;
				run;

				proc transpose data= &outf.&i. 
					out= &outf.&i._out(rename=(COL1=s&i. _NAME_=Year) drop=_label_);
					by Type;
					VAR '1988'N - '2019'N;
				run;

				data &outf.&i._outx;
					set &outf.&i._out;

					if input(s&i., ?? 32.) = . then s&i.2=.;/*added line 21_Jan*/
					else s&i.2=input(s&i., 32.);/*added line 21_Jan*/

					/*if s&i. in: ('NA', '$$', '..') then s&i.2=.;
								else if s&i. not in: ('NA', '$$', '..') then s&i.2=input(s&i., 32.);*/
					drop s&i.;
					rename s&i.2=s&i.;
				run;

			%end;
	%end;

	/* Merging*/
	%let outfm = %substr(&outf,1,%length(&outf)-5);

	data &outfm.merge1;
		merge
			%do i =&StartSheet.+1 %to &EndSheet.;
		&outf.&i._outx(keep= type year s&i.)
		%end;
		;
		by type year;
	run;

	data &outfm.merge2 (drop=NAME);
		merge
			&outf.&StartSheet.
			&outfm.merge1 
		;
		by type;
	run;

	*********************************************;
	/* Delete file results to reduce operating*/
	*************;
	%local i;

	%do i = &StartSheet.+1 %to &EndSheet.;

		*+1 because sheet 1 behave differently*;
		proc delete data=&outf.&i.;

		proc delete data=&outf.&i._out;

		proc delete data=&outf.&i._outx;
	%end;

	*****************;
	proc delete data=&outf.1;

	proc delete data=&outfm.merge1;
		**********************************************;
		************************************************;
		**EXPORT DATASET**;
		/*https://communities.sas.com/t5/SAS-Programming/making-a-sas7bdat-file/td-p/417487*/
		libname myfolder 'C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\2Cfiltering';

	proc copy in=work out=myfolder;
		select &outfm.merge2;
	run;

	quit;

	/*proc delete data=&outfm.filter;*/
%mend;

/*Replicate all files in one folder*/
filename mydir 'C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\leniency & operating\dataxlsx';

data _null_;
	did = dopen('mydir');

	do i = 1 to dnum(did);
		fname = scan(dread(did,i),1,'.');
/*fname=ARGENTINAARS for a particular country where last three characters(ARS) are currency and before that(ARGENTINA) is country name*/ length short_fn $29 currency $3; short_fn= cats(substr(fname, 1,length(fname)-3),'_'); currency=substr(fname,length(fname)-2); cmd=cats('%ImportAndTranspose(File=C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\leniency & operating\dataxlsx\', strip(fname), ',cur=',currency, ',outf=',short_fn,'sheet,startsheet=1,endsheet=45);'); call execute(cmd); end; keep fname; run;

I believe that, with your help, I can be more involved and professional in working with SAS.

 

Warmest regards,

 

Many thanks in advance,

Phil.

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
7 REPLIES 7
Phil_NZ
Barite | Level 11

Hi @ChrisNZ 

Mainly it is about the type and length of the numeric variables. It is very dangerous. I scare some of my numeric observation will be truncated due to the fact that EXCEL does not allow me to use GUESSINGROW=MAX to go through all the observations to have a final attribute for a variable.

 

For example, total assets of a company in the 30th row is 10 digit, says $10bil. but if GUESSINGROW in EXCEL  is automatically is 20, and the highest value for the first 20 observations is $999mil (9 digits). So, it makes sense if the 30th observation would be truncated. Even @Patrick  test with 1 million obs and seems GUESSINGROW works til the 1 millionth ob

 

Practically speaking, I also test data of US and UK, and I saw that the maximum of asset of a company in the excel file is similar to that in the SAS dataset using import. But because there is no official document that confirms that SAS will go through all observation to get the final attributes. Therefore, the accuracy cannot be confirmed...

 

Warmest regards,

Phil.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Patrick
Opal | Level 21

@Phil_NZ After 5 minutes Googling I found in a SAS Whitepaper here below statement. Does this ease your concern?

Patrick_0-1618812430468.png

 

Phil_NZ
Barite | Level 11

Hi @Patrick 

 

That's great, but I am still trying to find the whitepaper mentions that xlsx engine also scans all the rows in each column to determine the data length as well.

Can I ask your keyword that I can do a search, thank you?

 

Warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Patrick
Opal | Level 21

Click on "here" in my previous post. That gets you to the mentioned paper.

ChrisNZ
Tourmaline | Level 20

Doesn't option DBSASTYPE help?

The way this is going, you could read all columns as strings, and then transform then.

This way you control the reading process.

 

Kurt_Bremser
Super User

There is a dedicated tool for XLSX to CSV conversion called xlsx2csv, look into that; the LibreOffice convert from the commandline will always pick up the first sheet only.

A google search for "convert xlsx to csv with libreoffice in batch" might reveal further options.

 

Once you have a directory filled with CSV's, a single data step will read them all with consistent attributes.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 556 views
  • 3 likes
  • 4 in conversation