BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
saraphdnz
Quartz | Level 8

Hello there - I have multiple folders that contain CSV files and the number of .csv files varies from one folder to another folder and I can see a max of 100 CSVs. 

I am using 

%do i=1 %to &memcnt; 

while importing the folders and not all CSV files have data, so import macro skipping those CSV's and I can see different datasets numbers (for some folders) like D1 D3 D4 D7 D45 ...

Now, my question is that is there a way to generalize the dataset names to call them in a macro for further analysis as I can't just use the below command for all.

 

%do i =1 %to 100;
data test_D&i; /*datasets created in WORK from importing*/
set D&i;

I am getting an error when I  merge the datasets by dates.

ERROR: BY variable date is not on input data set WORK.D63.
ERROR: BY variable date is not on input data set WORK.D64.
ERROR: BY variable date is not on input data set WORK.D65.
NOTE: The SAS System stopped processing this step because of errors.
WARNING: The data set OUT.TEST may be incomplete.  When this step was stopped there were 0 observations and 7 variables.
NOTE: DATA statement used (Total process time):
      real time           0.18 seconds
      cpu time            0.15 seconds

 

Please advise.

 

Regards, Sara

 

1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

If the datasets of interests are actually D1, D2, ...., and no other datasets in the sas library have names that begin with the letter "d", then

 

data all_datasets;
  set work.d: indsname=indsn;
  source_dsn=scan(indsn,-1,'.');
run;

You don't need a macro to process one dataset at a time.  You can process all of them in a single DATA step, as above.

 

Now if all your CSV files have exactly the same data layout, then I imagine someone else in this community will show you how to process all of the CSV files in a single DATA step, thereby avoiding making the intermediate SAS datasets.  But I'm too lazy today to refresh myself on the appropriate UNIX or WINDOWS statements required to construct a stream of data from all CSV files.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

7 REPLIES 7
Tom
Super User Tom
Super User

Hard to tell what you are talking about.

1) Are all of the CSV files supposed to have the same structure? Same variables in the same order. If so then it is much easier to just read them all at once into one dataset.

 

2) Don't you know the names of the dataset you want to work on? 

 

If you want to pass a list of names to a macro just a space delimited list. 

%mymacro(dslist=ds1 ds256 ds3245);

If you need to use them all together just use the macro's value directly.

data all;
  set &dslist;
run;

If you need to use them separately the use %SCAN().

%do i=1 %to %sysfunc(countw(&dslist,%str( ));
  %let ds=%scan(&dslist,&i,%str( ));
  ....
%end;

 

saraphdnz
Quartz | Level 8

Hi Tom,

 

1. yes the CSV files are of the same structure with the same variables in the same order.

2. I know the names of the CSV files but that name doesn't matter so I import them as D&i

3. I want to use all imported files to do the analysis. 

so if I want to format the dates of all the imported files - how do I write the code to call the D&i here even though D&i are not in sequence?

%macro test1;
%do i =1 %to 100;
data test_D&i (rename = ( var2 = debt var3 = datestamp))  ;
set D&i;
IF var3 = 0 OR var3 = "0" THEN DELETE;
YEAR1 = SCAN (var3,3 ,'-') * 1; 
MONTH1 = SCAN (var3,2,'-');
DAY1= SCAN (var3,1,'-')* 1;
DAY = DAY(var3);
MONTH = MONTH(var3);
YEAR = YEAR(var3);
date = mdy(month,day, Year);
format date date9.;
drop day1 month1 year1;
run;

%end;
%mend ;
%test1;

 

or do I need to make the changes in my import macro step thanks? 


proc import datafile="&dir\%qsysfunc(dread(&did,&i))" out=d&cnt dbms=csv replace;

 

 

 

 

 

ChrisNZ
Tourmaline | Level 20
IF var3 = 0 OR var3 = "0" THEN DELETE;

This is dirty code. Your variables should not have an unknown or changing type.

 

Likewise, your date should have been imported as such, not as a text string .

 

Solve all these problems (and more to come for sure) by using a data step+infile+input instead of proc import.

happy_sas_kitty
Obsidian | Level 7

The code you posted here is not causing the error in your original post.

The error says: by variable DATE is missing.

It seems that you wrote a "by" somewhere for those 3 data steps.

 

mkeintz
PROC Star

If the datasets of interests are actually D1, D2, ...., and no other datasets in the sas library have names that begin with the letter "d", then

 

data all_datasets;
  set work.d: indsname=indsn;
  source_dsn=scan(indsn,-1,'.');
run;

You don't need a macro to process one dataset at a time.  You can process all of them in a single DATA step, as above.

 

Now if all your CSV files have exactly the same data layout, then I imagine someone else in this community will show you how to process all of the CSV files in a single DATA step, thereby avoiding making the intermediate SAS datasets.  But I'm too lazy today to refresh myself on the appropriate UNIX or WINDOWS statements required to construct a stream of data from all CSV files.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Tom
Super User Tom
Super User

So write one data step to read in ALL of the CSV files at once.  In addition to removing the confusion of macro logic you also will get full control over how the values are read to begin with.

 

If your variables are named VAR1, VAR2 etc then probably the CSV files do not have header rows.  That makes reading them even easier.  Perhaps something like this:

 

data want;
  infile '/somedirectory/*.csv' dsd truncover ;
  length var1 $20  debt  8 datestamp 8;
  informat datestamp ddmmyy.;
  format datestamp yymmdd10.;
run;
saraphdnz
Quartz | Level 8

Thanks, everyone for your valuable comments and suggestions. it seems I have to learn a lot.

 

Regards,

Sara 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1333 views
  • 3 likes
  • 5 in conversation