BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi all, I am importing 64 sas7bdat files in one folder to SAS, my code is as below working well

 

/*macro importdata*/
%macro importdata(File=,outf=); Libname input 'C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\3calculation'; data &outf.; set input.&outf._; run; %mend;
/*replicate for all files in one folder*/ filename mydir 'C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\3calculation'; data _null_; did = dopen('mydir'); do i = 1 to dnum(did); fname = scan(dread(did,i),1,'.'); /*At this point I assign that the filename has the suffix is xlsx or sas7bdat fname= ARGENTINA_*/ length short_fn $29; short_fn= substr(fname, 1,length(fname)-1); /*short_fn=ARGENTINA*/ cmd=cats('%importdata(File=C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\3calculation\', strip(fname),',outf=',short_fn,');'); call execute(cmd); end; keep fname; run;

I want to ask if there is any way to optimize my code by shortening it, especially excluding the macro because I think it is less aesthetic if I call a macro for only importing files?

Many thanks and warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@Phil_NZ wrote:

Hi @Tom 

 

Thank you for your notice.

However, I think your suggestion is to export a sas7bdat dataset (by using proc copy). what I want is to call 64 sas7bdat datasets in one folder to SAS EG to merge them vertically after that.

 

Warm regards.


Your first step would be make sure what the lengths and types of common named variables might be. ANY thing that appends data sets, which what you really mean with "merge vertically", is going to be somewhat picky about this. As is if the same named variable is different types then the process will fail and if there are different lengths then variable data may get truncated plus generating a lot of warning messages about different lengths.

 

Before starting this I would suggest getting to know your data by running some code similar to:

proc format;
value showx
low-high='x'
;
run;
proc tabulate data=sashelp.vcolumn;
   where libname='YOURLIB' and memtype='DATA';
   class memname name type length;
   table name*type*length,
         memname*n=' '*f=showx.
         /misstext=' '
   ;
run;

You would replace YOURLIB with the name of your library in uppercase as that is how the data is stored.

The format is just to display an X as there really should only ever be 1 in any of the cells (can't have the same varaible (name) more than once.

 

What you are looking for: Variables where the row indicates both types of variable, numeric or character. These are the ones that will cause your process to fail. Pick one type and modify all the variable to the same type.

Next, where the length is different for the same type. These are usually character variables but not always.

Best would be make sure all have the same length. But you can address latter issues by getting the LARGEST value of length for each of the variables and making your base data set with that length if you use Proc Append.

If you use the SET dataset1 dataset2 ... ; approach you can add a LENGTH statement above the SET to specify the length and should be largest to reduce loss of data:

 

Data want;

   length var1 $ 55   var27 $ 18 var50 $ 500;

   SET dataset1 dataset2  ...   ;

run;

 

Ideally you should also specify a format for those variables because you could create a variable of length $ 500 but one of the contributing variables sets the display format to $10. and you then wonder where the data went when all that is happening is that your format isn't displaying the full value.

View solution in original post

12 REPLIES 12
Tom
Super User Tom
Super User

Huh?

You don't need to "import" SAS datasets.  Just use them.

libname in 'path1';
libname out 'path2';
proc copy inlib=in outlib=out mt=data;
run;
Phil_NZ
Barite | Level 11

Hi @Tom 

Thank you for your suggestion, I assume that you are mentioning this code

%macro importdata(File=,outf=);
Libname input 'C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\3calculation';

    data &outf.;

    set input.&outf._;

  run;
%mend;

If I do not import like that, what should I do by the way you mention? The folder contains 64 files sas7bdat.

 

Many thanks and warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
andreas_lds
Jade | Level 19

"Import" means that you have a text-file and read it to create a sas dataset. When you already have datasets, you don't "import" them, you just use the libname-statement to define a library so that you can use the datasets. So after your libname-statement, you can (and should) write statements like

 

proc print data= input.DatasetName;
run;

If you want to avoid overwriting datasets in "input" accidentally, you can make the library read-only:

libname input "your_path" access=readonly;
Phil_NZ
Barite | Level 11

Hi @andreas_lds ,

 

If I understand you correctly, the code will be re-written as below:

%macro importdata(File=,outf=);
Libname input 'C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\3calculation - Copy' access=readonly;

 proc print data= input.&outf._;

  run;
%mend;

filename mydir 'C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\3calculation - Copy';
data _null_;
did = dopen('mydir');
do i = 1 to dnum(did);
  fname = scan(dread(did,i),1,'.');
  /*At this point I assign that the filename has the suffix is xlsx or sas7bdat
  fname= ARGENTINA_*/
  length short_fn $29;
  short_fn= substr(fname, 1,length(fname)-1);
  /*short_fn=ARGENTINA*/

  cmd=cats('%importdata(File=C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\3calculation - Copy\',
      strip(fname),',outf=',short_fn,');');
  call execute(cmd);
end;
keep fname;
run;

This code run very slow when my initial code is very fast in running and I think I explain you improperly.

 

What I want is to have these datasets for concatenating these datasets together (merge them vertically).

 

Many thanks and warmest regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
andreas_lds
Jade | Level 19

Remove the macro, except for the libname-statement.

The proc print was just an example to show the usage of the datasets in the library "input".

Tom
Super User Tom
Super User

You have posted code that manually queries the list of files in a physical directory and when it finds the name of a SAS dataset it then runs a data step to copy the dataset from its original location into the SAS WORK library.

So if you want to copy all of the datasets from the INPUT libref to the WORK libref you could just run this one PROC COPY step.

proc copy inlib=INPUT outlib=WORK mt=data ;
run;

Now if you want to instead concatenate all of the dataset in the INPUT libref into a single WORK dataset you could try running a data step like this:

data want;
  set input.ds1 input.ds2 ..... input.ds64 ;
run;

So how you can generate that list of dataset names to use in the SET statement?  You can query the SAS metadata, no need to go and query the physical operating system directory, let SAS handle that for you.

Conceptually the simplest is to learn how to use PROC CONTENTS.  To get the content information for all dataset in a libref use the _ALL_ special keyword in the DATA= option.  You can add the NOPRINT output if you don't want to bother seeing a printout of each datasets contents information.

proc contents data=INPUT._ALL_ noprint out=contents; 
run;

Now you can use the MEMNAME and LIBNAME variables in the CONTENTS dataset generated by PROC CONTENTS to get the list to use for your concatenation step.

proc sql noprint;
select distinct catx('.',libname,memname)
  into :dslist separated by ' '
from contents
;
quit;
data want;
  set &dslist.;
run;
Phil_NZ
Barite | Level 11

Hi @andreas_lds !

 

Thank you for your suggestion. Sorry for my late replying.

I am interested in the option "access=readonly". And whether it equals to @Tom 's code below regarding "preventing overwrite on the existing dataset"?

 

Tom's code

Libname input 'C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\3calculation - Copy';
/*If you want to avoid overwriting datasets in "input" accidentally, you can make the library read-only*/

proc copy inlib=input outlib=work mt=data;
run;

Your code

libname input "C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\3calculation - Copy" access=readonly;

So, in general, I think your codes are similar in preventing overwrite on the existing dataset, just in your case, I should work with the input library while I should work with the work library in @Tom 's case. Sorry for this novice question but I would love to be prudent in using code.

 

Many thanks and warmest regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Phil_NZ
Barite | Level 11

Hi @Tom 

 

Thank you for your notice.

However, I think your suggestion is to export a sas7bdat dataset (by using proc copy). what I want is to call 64 sas7bdat datasets in one folder to SAS EG to merge them vertically after that.

 

Warm regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
ballardw
Super User

@Phil_NZ wrote:

Hi @Tom 

 

Thank you for your notice.

However, I think your suggestion is to export a sas7bdat dataset (by using proc copy). what I want is to call 64 sas7bdat datasets in one folder to SAS EG to merge them vertically after that.

 

Warm regards.


Your first step would be make sure what the lengths and types of common named variables might be. ANY thing that appends data sets, which what you really mean with "merge vertically", is going to be somewhat picky about this. As is if the same named variable is different types then the process will fail and if there are different lengths then variable data may get truncated plus generating a lot of warning messages about different lengths.

 

Before starting this I would suggest getting to know your data by running some code similar to:

proc format;
value showx
low-high='x'
;
run;
proc tabulate data=sashelp.vcolumn;
   where libname='YOURLIB' and memtype='DATA';
   class memname name type length;
   table name*type*length,
         memname*n=' '*f=showx.
         /misstext=' '
   ;
run;

You would replace YOURLIB with the name of your library in uppercase as that is how the data is stored.

The format is just to display an X as there really should only ever be 1 in any of the cells (can't have the same varaible (name) more than once.

 

What you are looking for: Variables where the row indicates both types of variable, numeric or character. These are the ones that will cause your process to fail. Pick one type and modify all the variable to the same type.

Next, where the length is different for the same type. These are usually character variables but not always.

Best would be make sure all have the same length. But you can address latter issues by getting the LARGEST value of length for each of the variables and making your base data set with that length if you use Proc Append.

If you use the SET dataset1 dataset2 ... ; approach you can add a LENGTH statement above the SET to specify the length and should be largest to reduce loss of data:

 

Data want;

   length var1 $ 55   var27 $ 18 var50 $ 500;

   SET dataset1 dataset2  ...   ;

run;

 

Ideally you should also specify a format for those variables because you could create a variable of length $ 500 but one of the contributing variables sets the display format to $10. and you then wonder where the data went when all that is happening is that your format isn't displaying the full value.

Kurt_Bremser
Super User

@Phil_NZ wrote:

Hi @Tom 

 

Thank you for your notice.

However, I think your suggestion is to export a sas7bdat dataset (by using proc copy). what I want is to call 64 sas7bdat datasets in one folder to SAS EG to merge them vertically after that.

 

Warm regards.


PROC COPY does NOT (as in NOT) "export" a dataset, it copies a SAS dataset as is to another location. "Export" is used when the contents of a SAS dataset are converted to another file format, like Excel, CSV, or to a DBMS.

Tom
Super User Tom
Super User

It was your original code that was copying the dataset (don't say "import" as that implies that you are transforming the data from a foreign format like a text file or spreadsheet or other database system).  So the direct comparison was to use PROC COPY to perform the copy.  But as I said before if the real goal was to create one dataset that combines all of the original datasets there is no need to first copy them original files. Just read the original files when performing the aggregation step.  As I showed in the earlier post.

 

Using the ACESS=READONLY option on a libname statement is a good habit to get into as it will provide a little more protection from mistakes that might harm your original source data.  For example if you forget the first semi-colon in a program like this:

data work.ds1 ;
  set input.ds1;
run;

You have converted from a program that copies one dataset to one that creates three empty datasets.

data work.ds1 set input.ds1;
run;

 You are right that first copying the dataset to WORK (or some other libref) and then working with them there will also provide some protection from that type of mistake.  You can always go back to the original dataset if you accidentally destroy the copy.

Phil_NZ
Barite | Level 11

Hi @Tom , @Kurt_Bremser @ballardw , and @andreas_lds 

 

It is a long day for me and I am really excited that I have the chance to learn a lot from your suggestion. The picture is much more clearer.

First of all, I can separate the meaning of import/export data with the call or copy data from one to another library.

 

Secondly, I follow @Tom and @ballardw 's way of doing things ( using SET statement to concatenate dataset)


@Tom wrote:

 

Libname input 'C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\3calculation - Copy' access=readonly;
proc contents data=INPUT._ALL_ noprint out=contents; 
run;

proc sql noprint;
select distinct catx('.',libname,memname)
  into :dslist separated by ' '
from contents
;
quit;
data want;
  set &dslist.;
run;

And, you have converted from a program that copies one dataset to one that creates three empty datasets.

data work.ds1 set input.ds1;
run;

 


Can I ask why one creates three empty datasets? why "three"? Thank you.

 

And as @ballardw  notice, I hit the wall which fails me from getting the result. So, I follow @ballardw's specified proc tabulate to identify the mystery behind and it ends up like that my variable BDATE contains both numeric and char types.

My97_0-1612594439624.png

@ballardw  also gave me a solution that

What you are looking for: Variables where the row indicates both types of variable, numeric or character. These are the ones that will cause your process to fail. Pick one type and modify all the variable to the same type.

Can you please tell me how to code to "pick one type"? And what type do you think the date should be? char or numeric?

The current code is as below

 

Libname input 'C:\Users\pnguyen\OneDrive - Massey University\PhD JOURNEY\3calculation - Copy' access=readonly;

proc format;
value showx
low-high='x'
;
run;
proc tabulate data=sashelp.vcolumn;
   where libname='INPUT' and memtype='DATA';
   class memname name type length;
   table name*type*length,
         memname*n=' '*f=showx.
         /misstext=' '
   ;
run;

 

And I cannot understand the meaning of these codes, could you please explain it to me, especially "low-high='x'", I can't see any high or low variables so far in my current session.

proc format;
value showx
low-high='x'
;
run;

 

And last but not least, @ballardw also mentioned the  length assigned of variables

Ideally you should also specify a format for those variables because you could create a variable of length $ 500 but one of the contributing variables sets the display format to $10. and you then wonder where the data went when all that is happening is that your format isn't displaying the full value.

 

I am quite confused about the bold lines. From my novice understanding, $500 already contains $10, so I just think we set all length to $500.

 

Many thanks and warmest regards,

 

 

 

 

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 6813 views
  • 6 likes
  • 5 in conversation