Solved: Wildcard? Macros? Macros with a wildcard?

Sudeep_Neupane · Posted 12-02-2021 12:54 PM

Hi,

I have multiple sas transport files (.stc) that I am trying to import first into the sas data files (.sas7bdat). I am trying to use the macro to import all at once. However, my datafiles are named similar to sn55632-00001_blah-qwerty.stc, sn55632-00002_blah-yuiop.stc, sn55632-00003_blah-asdfg.stc, and so on. So, if you see the pattern here, the numbers after the first hypen are increasing by a unit. That's why I found somewhere that I could do 'mypath/sn55632-0000:.stc' (notice the colon there before .stc). However, it's not working for me for some reasons. Could anyone please guide me how I can go about it?

Thank you!

SASKiwi · Posted 12-02-2021 03:41 PM

Use CALL EXECUTE with @Tom 's program:

%macro import (filename = );

proc cimport lib=w infile = "&filename";
run;

%mend import;

data files;
  rc=filename('dir', 'C:\Users\Public\ABCDE\alldata');
  did=dopen('dir');
  do index=1 to dnum(did);
    length filename $200;
   filename=dread(did,index);
   if 'stc'=scan(filename,-1,'.') then 
   call execute('%import(filename =' !! strip(filename) !! ');');
  end;
  rc=dclose(did);
run;

View solution in original post

SASKiwi · Posted 12-02-2021 01:57 PM

In SAS the colon is not a true wild card. It allows you to select variables that start with the text to the left of the colon. You can't use it in file names where you need to follow the OS file naming rules. There are other ways of doing what you want but you first need to post a working hard-coded version of your program.

Sudeep_Neupane · Posted 12-02-2021 02:23 PM

I am working on this private data on a standalone device so I cannot really share exactly what I am doing. However, I can probably write the same code with different filenames to show here. What would be the relevant wildcard operator instead of colon in SAS as you mentioned?

Sudeep_Neupane · Posted 12-02-2021 02:34 PM

option nofmterr;

libname w 'C:\Users\Public\ABCDE\alldata';

%macro import (n);

filename transfile 'C:\Users\Public\ABCDE\alldata\sn55632-0000&n.\:.stc';

proc cimport lib=w infile =tranfile;

run;

%mend import;

%import (1);

This is what I have right now.

The filenames are similar to:

sn55632-00001_blah-qwerty.stc

sn55632-00002_blah-yuiop.stc

sn55632-00003_blah-asdfg.stc

.

sn55632-00049_blah-asdfg.stc

Tom · Posted 12-02-2021 02:45 PM

Looks like you want to build the list of filename first.

If you can run operating system commands then use that. For example on unix you could use the ls command.

data files;
  infile "ls /users/public/abdcde/alldata/*.stc" pipe truncover;
  input filename $200.;
run;

Or for Window you might use the DIR command.

data files;
  infile "c:; cd C:\Users\Public\ABCDE\alldata; dir /b *.stc" pipe truncover;
  input filename $200.;
run;

Or you might need to use DOPEN() and DREAD() commands instead to get the list of files.

data files;
  rc=filename('dir', 'C:\Users\Public\ABCDE\alldata');
  did=dopen('dir');
  do index=1 to dnum(did);
    length filename $200;
   filename=dread(did,index);
   if 'stc'=scan(filename,-1,'.') then output;
  end;
  rc=dclose(did);
run;

Once you have the list of filenames use that to generate the code to process each STC file.

What type of files are these STC files? The STC extension has no fixed meaning. If you don't know that they are take a look at the first few records in the file.

data _null_;
  infile 'C:\Users\Public\ABCDE\alldata\*.stc' lrecl=80 recfm=f obs=5;
  input;
  list;
run;

Sudeep_Neupane · Posted 12-02-2021 03:18 PM

Awesome. Thank you!

The 3rd block of code you provided worked great. I might still need some help looping through those names to import them as .sas7bdat files.

SASKiwi · Posted 12-02-2021 03:41 PM

Use CALL EXECUTE with @Tom 's program:

%macro import (filename = );

proc cimport lib=w infile = "&filename";
run;

%mend import;

data files;
  rc=filename('dir', 'C:\Users\Public\ABCDE\alldata');
  did=dopen('dir');
  do index=1 to dnum(did);
    length filename $200;
   filename=dread(did,index);
   if 'stc'=scan(filename,-1,'.') then 
   call execute('%import(filename =' !! strip(filename) !! ');');
  end;
  rc=dclose(did);
run;

Sudeep_Neupane · Posted 12-03-2021 02:47 PM

Thank you @Tom and @SASKiwi. No doubt you all are Super User. I had never utilized the forum before. Well, I guess I am under the shadow of the legends.

Muchas Gracias!

Sudeep_Neupane · Posted 12-03-2021 03:31 PM

Further Question: I am using a similar method to sort the data.

%macro sort (filename = );

proc sort data=w.&filename;
by ID;
run;

%mend sort;

data files;
  rc=filename('dir', 'C:\Users\Public\ABCDE\alldata');
  did=dopen('dir');
  do index=1 to dnum(did);
    length filename $200;
   filename=dread(did,index);
   if 'sas7bdat'=scan(filename,-1,'.') then 
   call execute('%sort(filename =' !! strip(filename) !! ');');
  end;
  rc=dclose(did);
run;

The problem with this is that the filename includes '.sas7bdat', and when it goes to proc sort, it'll throw an error because proc sort data=filename needs the filenames without extensions. How do I do it? I tried doing strip(filename-'.sas7bdat') to see if that's a thing but it didn't work either.

Thank you!

Tom · Posted 12-03-2021 04:02 PM

You can either use SCAN() to extract just the member name from the filename.

call execute(cats('%sort(filename=',scan(filename,1,'.'),')'));

Or change the macro to accept a full dataset name instead of just a member name in the W libref.

%macro sort (dataset);

proc sort data=&dataset ;
by ID;
run;

%mend sort;

And then you could pass in either W.MEMBER or just the quoted physical name.

call execute(cats('%sort(',quote(catx('\','C:\Users\Public\ABCDE\alldata',filename)),')'));

But if you already have a libref pointing to the directory there is no need to look for files. Just look for datasets. In fact you should only look for datasets that have the ID variable.

libname w 'C:\Users\Public\ABCDE\alldata';
proc sql;
create table members as
  select libname,memname
  from dictionary.columns
  where libname='W' and upcase(name)='ID'
  order by libname,memname
;
quit;
data _null_;
  set members;
  call execute(catx(' ','proc sort data=',catx('.',libname,memname),';by id;run;'));
run;

Sudeep_Neupane · Posted 12-03-2021 04:27 PM

Wonderful. It worked like a charm! And thank you for pointing out the good programming practice out there.

Sudeep_Neupane · Posted 12-03-2021 05:09 PM

Is it okay if I ask more questions @Tom ? I feel like I have been learning a lot this way.

So, I have 134 datasets (now SAS datasets, thanks to you), all sorted within the same library. Now I need to merge them in two ways.

First, I need to merge, them into four phases:

1st Phase: s12345_0001_ads_qwerty

....

s12345_0049_ads_cvbnm

2nd Phase: s6789_0001_ads2_dfghk

......

s6789_0028_ads2_zxcvb

and so on.

If you see the pattern here, the first name before the underscore stays the same, while the number after the underscore increases by 1 throughout the phase.

Secondly, I need to merge all 134 of them into one dataset.

The problem with using similar macros, in this case, is I guess you have one filename passed to it at once, so SAS doesn't recognize what to merge. While with the SQL, I have hard time assigning dataname into the table, if that's the way.

Also, I have not used loops to run individual files through the merge statement yet.

Thank you!

Tom · Posted 12-03-2021 06:11 PM

So each dataset has ID and some other set of distinct variables that have different names than the variables in the other similarly named datasets?

data s12345;
  merge w.s12345_: ;
  by id;
run;
data s6789;
  merge w:s6789_:;
  by id;
run;

Are you sure don't want to SET them together instead of MERGEing them?

data s12345;
  set w.s12345_: indsname=indsname;
  by id;
  source=indsname;
run;

Sudeep_Neupane · Posted 12-03-2021 07:27 PM

Yes. Each dataset has ID as a unique identifier. There are ~1400 observations altogether i.e. number of participants. 1st phase would have different datasets for different observations for 3 years on those participants.
2nd phase has those observations for another 3years for the same participants. Likewise for the 3rd and 4th phases.
So, if I set them, there will be 4(number of phases)*1400 observations = 6400 observations, while I only want 1400 observations.

Tom · Posted 12-03-2021 10:41 PM

If you have 4 years of data for 1,400 subjects then you SHOULD have 4*1,400 = 6,400 observations.

Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

Re: Wildcard? Macros? Macros with a wildcard?

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away