BookmarkSubscribeRSS Feed
_el_doredo
Quartz | Level 8

Hello Experts,

Hope I able to get answer for my requirement in this forum

 

Requirement: I am having more than 100 files(contains xlsx, xls and csv format) in a path. I need to import only one observation from these files. So, My output like I need to create 100 dataset for 100 files and all dataset need to have only one observation(I don't want remaining records in my dataset)

 

This is my code:

 

%let file_path=C:\Users\sas\coding\;

/*************************************************************************/
/* FETCHING ALL FILE NAME FROM PATH */
/*************************************************************************/

data fetching_files1;
length fref $8 fname $200;
did = filename(fref,"&file_path.");
did = dopen(fref);
do i=1 to dnum(did);
fname = dread(did,i);
output;
end;
did = dclose(did);
did = filename(fref);
keep fname;
run;

/*************************************************************************/
/* SEPERATING FILE NAME AND FILE TYPE */
/*************************************************************************/

data fetching_files;
set fetching_files1;
name=scan(fname,1,'.');
type=scan(fname,2,'.');
run;
/*************************************************************************/
/* IMPORTING ALL FILES INTO WORK LIBRARY */
/*************************************************************************/
%macro importing_data(dataset);

proc sql; 
select count(*) into :NObs trimmed from &dataset; 
select fname into :file1-:file&NObs. from &dataset;
select name into :name1-:name&NObs. from &dataset;
select type into :type1-:type&NObs. from &dataset; 
quit;

%do i=1 %to &nobs;
PROC Import datafile="&file_path.\&&file&i."
out="&&name&i."n dbms=&&type&i replace;
getnames=yes;
run;
%end;
%mend importing_data;
%importing_data(fetching_files);
I tried two methods 
   1. Using Option obs before PROC IMPORT
%do i=1 %to &nobs;
options obs=1;
PROC Import datafile="&file_path.\&&file&i."
out="&&name&i."n dbms=&&type&i replace;
getnames=yes;
run;
options obs=max;
%end;
  2. By giving last row of sheet as a datarow
 
%do i=1 %to &nobs;
PROC Import datafile="&file_path.\&&file&i."
out="&&name&i."n dbms=&&type&i replace;
getnames=yes;
datarow=1048576;
run;
%end;
But These two methods are consuming same amount of time as importing entire file.
 
Please suggest me some less time consuming method to import only one row.
 
Thanks in advance
10 REPLIES 10
Tom
Super User Tom
Super User

Why would you use PROC IMPORT to read two lines from a text file?

Just read the two lines yourself instead.

 

If you don't care about the headers then just read the second line.

data topline;
  infile 'myfile.csv' dsd truncover column=cc length=ll firstobs=2 obs=2 ;
  do column=1 by 1 until(cc>ll);
     input string :$100. @;
     output;
  end;
run;

If you do care about the headers then read those first and store them into an array or a hash object.

data topline;
  infile 'myfile.csv' dsd truncover column=cc length=ll ;
  array headers [1000] $100 _temporary_;
  do column=1 by 1 until(cc>11);
     input headers[column] @;
  end;
  input;
  do column=1 by 1 until(cc>ll);
     header=headers[column];
     input string :$100. @;
     output;
  end;
  stop;
run;
_el_doredo
Quartz | Level 8

Hello @Tom,
Thanks for that. But I am trying to import excel file as well( xlsx and xls format). We are not able to import excel files using infile statement right?. Correct me if I am wrong.

and also I need to import more than 100 files

 

Note: Yes I need Headers as well

Tom
Super User Tom
Super User

Before you try to figure out how to do something 100 times you need to figure out how to do it once.

To read an XLSX file you can use the XLSX libname engine.  That will give you access to every worksheet in the file as it it was a dataset.

To keep only one observation from an XLSX file you can use the OBS system option in combination with PROC COPY.  Let's try it out. First let's make an XLSX file with two sheets. Then try copying out the sheets.

filename xx temp;
libname xx xlsx "%sysfunc(pathname(xx))";
data xx.male ;
  set sashelp.class;
  if sex='M' ;
run;
data xx.female ;
  set sashelp.class;
  if sex='F' ;
run;

options obs=1;
proc copy inlib=xx outlib=work; run;
options obs=max;

See how well it worked.

714   libname xx xlsx "%sysfunc(pathname(xx))";
NOTE: Libref XX was successfully assigned as follows:
      Engine:        XLSX
      Physical Name: C:\Users\ABERNA~1\AppData\Local\Temp\1\SAS Temporary Files\_TD14144_AMRAPY3WVP0VKU0_\#LN00058
715   data xx.male ;
716     set sashelp.class;
717     if sex='M' ;
718   run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set XX.male has 10 observations and 5 variables.
NOTE: The export data set has 10 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


719   data xx.female ;
720     set sashelp.class;
721     if sex='F' ;
722   run;

NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: The data set XX.female has 9 observations and 5 variables.
NOTE: The export data set has 9 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds


723
724   options obs=1;
725   proc copy inlib=xx outlib=work; run;

NOTE: Copying XX.FEMALE to WORK.FEMALE (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 1 observations read from the data set XX.FEMALE.
NOTE: The data set WORK.FEMALE has 1 observations and 5 variables.
NOTE: Copying XX.MALE to WORK.MALE (memtype=DATA).
NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used.
NOTE: There were 1 observations read from the data set XX.MALE.
NOTE: The data set WORK.MALE has 1 observations and 5 variables.
NOTE: PROCEDURE COPY used (Total process time):
      real time           0.03 seconds
      cpu time            0.03 seconds


726   options obs=max;

If you need to keep track of what sheets were in each particular file you might need to work harder.

 

For XLS files it will be harder to read more than just the first sheet.

Do you know if any of the XLS files have more than one sheet?

 

Also PROC IMPORT for XLS (and XLSX files) do not seem to honor OBS system option. So you will need to add another step to truncate the dataset after creating it.  Again let's run a test.

filename xx temp;
proc export data=sashelp.class file=xx dbms=xls ; run;
proc import file=xx dbms=xls out=class replace; run;
data class;
  set class(obs=1);
run;

Results

740   filename xx temp;
741   proc export data=sashelp.class file=xx dbms=xls ;
741 !                                                   run;

NOTE: The export data set has 19 observations and 5 variables.
NOTE: "XX" file was successfully created.
NOTE: PROCEDURE EXPORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds


742   proc import file=xx dbms=xls out=class replace;
742 !                                                 run;

NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with options MSGLEVEL=I.
NOTE: The import data set has 19 observations and 5 variables.
NOTE: WORK.CLASS data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds


743   data class;
744     set class(obs=1);
745   run;

NOTE: There were 1 observations read from the data set WORK.CLASS.
NOTE: The data set WORK.CLASS has 1 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
_el_doredo
Quartz | Level 8
Hello @Tom,

Class dataset contains only 19 observation and 5 Variables. But, Each of my files contains more than 40,000 observation and 40 variables. So, In Real time it taking 4 minutes to fetch one file.

As I am having more than 100 files. It's taking lots of time. So, I am looking for any easier way
Tom
Super User Tom
Super User

FETCH makes it sound like you are moving the files from some remote location to the machine where SAS is running.  In that case you will have to move the whole file for SAS to be able to use it.  Especially for binary file formats like XLS or XLSX files.

Tom
Super User Tom
Super User

You can use the RANGE option to read only two of the rows in an Excel worksheet.

Make sure to read enough columns.  It should only create variables for the columns that have values.

 

Example:

filename xx temp;
proc export data=sashelp.class file=xx dbms=xls ; run;
proc import file=xx dbms=xls out=class replace; 
  range='$A1:Z2';
run;

Which reads one observation and 5 variable even though we told it to read 26 columns.

758   proc import file=xx dbms=xls out=class replace;
759     range='$A1:Z2';
760   run;

NOTE: One or more variables were converted because the data type is not supported by the V9 engine. For more details, run with options MSGLEVEL=I.
NOTE: The import data set has 1 observations and 5 variables.
NOTE: WORK.CLASS data set was successfully created.
NOTE: PROCEDURE IMPORT used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
Patrick
Opal | Level 21

@_el_doredo wrote:
Hello @Tom,

Class dataset contains only 19 observation and 5 Variables. But, Each of my files contains more than 40,000 observation and 40 variables. So, In Real time it taking 4 minutes to fetch one file.

As I am having more than 100 files. It's taking lots of time. So, I am looking for any easier way

4 minutes per file with "only" 40,000 rows is excessive. SAS itself can import such data in no time (=below 1 second). 

What you need to figure out for reading a single file is where your process spends all this time.

 

Using Proc Import you should always scan the full file (guessingrows=max) as else you will likely end-up with tables that even have varying data types for the "same" variable in different tables which then will cause extra work later in the process.

xxformat_com
Barite | Level 11

Hi, to answer your question: it is possible to convert one tab of an excel file into a csv file. Check the code provided by @jos_monen here:  https://communities.sas.com/t5/SAS-Programming/How-to-import-excel-with-guessingrow-function/td-p/42...

xxformat_com
Barite | Level 11

You can use the datarow= statement in proc import when using dbms=xlsx.

But I have no clue on whether it is more performant.

ballardw
Super User

Try setting

options obs=2;

before import.

IF the data sets have one row of header information this should only read one value with proc import.

The "first" obs read is the header information so you need at least obs=2. Which will temporarily limit all processing to 2 observations. So you want reset to the default for Obs=max immediately after importing your files.

 

BUT a common problem with many XLXS and other spreadsheet formats people insist on having multiple rows of column headers.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1486 views
  • 0 likes
  • 5 in conversation