BookmarkSubscribeRSS Feed
rcsutherland
Fluorite | Level 6

Background is that I need to use filename command to execute grep and use the result as input.

Here is my input data set named test

firstname   lastname   filename
<blank>     <blank>    cus_01.txt
<blank>     <blank>    cus_02.txt

Filename values are actual files which I need to grep because I need certain string inside those files to fill up the firstname and lastname

Here is the code:

data work.test;
   set work.test;
   call symputx('file', filename);
   filename fname pipe "grep ""Firstname"" <path>/&file.";
   filename lname pipe "grep ""Lastname"" <path>/&file.";
   infile fname;
   input firstname;
   infile lname;
   input lastname; 
run;

However, macro variables created inside a data step can't be used until after the data step procedure is completed. So, that means, &file. can't be resolved and can't be used in filename.

Is there a way to for resolve the macro variable?

Thanks!

12 REPLIES 12
novinosrin
Tourmaline | Level 20

"However, macro variables created inside a data step can't be used until after the data step procedure is completed. "

 

You can counter that using Resolve function

 

http://support.sas.com/documentation/cdl/en/mcrolref/61885/HTML/default/viewer.htm#a000210258.htm

 

 

novinosrin
Tourmaline | Level 20

Example

 

'filename fname pipe "grep ""Firstname"" <path>/'||resolve('&file.');
rcsutherland
Fluorite | Level 6

Hi,

 

This doesn't seem to work, it doesn't recognize the resolve nor the || in the code. Here's what I did:

 

filename fname pipe "grep ""Firstname"" &path./" || resolve('&file.');

Error:

Invalid Option Name | |

Invalid Option Name file

Astounding
PROC Star

Given that you know the folder, and given that a variable FILENAME contains the name of file that you want to read, you should be able to eliminate the macro language entirely.  Note that the FILEVAR= option on the INFILE statement lets you specify as a variable the path to the file that should be read.  Here's some detail:

 

http://support.sas.com/techsup/technote/ts581.pdf

 

 

rcsutherland
Fluorite | Level 6

I'm new to SAS, can you explain how can FILEVAR help me?

 

I'm looking into this code from the example you gave:

 

 

filename indata pipe ’ls -1 /dept/tsd/dataread/files’;
data one;
length fil2read $40 lname $20 fname $20 state $2;
infile indata truncover;
input f2r $20.;
fil2read=’/dept/tsd/dataread/files/’||f2r;
infile dummy filevar=fil2read end=done;
do while(not done);
 input lname : $20. fname : $20. state $ phone;
 output;
end;
run;

From what I understand, filename command will store all files present in /dept/tsd/dataread/files to indata. Then first INFILE will input indata using input f2r. Then fileread variable will just concatenate the path and the name of the files (f2r).

 

Where can I insert my grep command here? Thanks!

 

 

Astounding
PROC Star

SInce you already have a list of files in a SAS data set named TEST, your program becomes much simpler, along these lines:

 

data want;
length f2r $ 70 lname $ 20 fname $ 20 state $ 2;
set test;
f2r = '/dept/tsd/dataread/files/' || filename;
infile dummy filevar=f2r end=done;
do until (done);
   input lname fname state phone;
   output;
end;
run;

Obviously, I don't have your data to test with.  But this should be a step in the right direction.  You'll have to be the one to test it and see how close we're getting to the result you want.

snoopy369
Barite | Level 11

FILEVAR helps you because it allows you to arbitrarily point an INFILE or FILE statement to a value defined as a data step variable (as opposed to a string in code).

 

Your grep command doesn't really make sense to me, because it seems like you're not using SAS the way it's intended to be used.  SAS is quote good at reading from files/strings/etc., and doesn't need help from grep; in fact you can use regular expressions if you wish ('perl regular expressions SAS' would be a good search, or, 'PRXMATCH' etc. functions).  

 

If you have a filename in a variable, and want to read some information from that file, then all you need is (assuming 'have' is your dataset with filenames, stored in a variable named 'file_name_variable').  'a' in the infile statement is a meaningless token that has to be there but doesn't do anything.

 

data want;
  set have;
  infile a filevar=file_name_variable;
  input lname $ fname $;
run;

 

Assuming they're in that order (last name then first name, space separated).  I imagine they're not space separated, but you haven't clarified that.

 

Now, if this is a JSON file, you have some better ways to do this (including using the JSON Libname ), but one way is:

 

data want;
  set have;
  infile a filevar=file_name_variable;
  input @'"FirstName":' fname $
           @'"LastName":' lname $
   ;
run;

This basically uses @-string matching to find the string that matches '"FirstName":'  and then reads the characters after that into fname.  You might need to do further processing to make sure it terminates at the next " character, or read the file as quotation-delimited, or something else, but it's possible.  Do an internet search on "SAS JSON read" and you'll get a lot of results.

 

Now, if you have a more complex question - such as, those files don't necessarily have 'first name' and 'last name' in a consistent order - you certainly could use a regular expression to solve your problem (again, you probably don't need to, but who knows).


data want;
  set have;
  infile a filevar=file_name_var;
  input @;
  _prx_fname = prxparse('~"FirstName":(".*")~ios');
  _prx_lname = prxparse('~"LastName":(".*")~ios');
  if prxmatch(_prx_fname,_infile_) then 
    fname = prxposn(_prx_fname,1,_infile_);
  if prxmatch(_prx_lname,_infile_) then 
    lname = prxposn(_prx_lname,1,_infile);
run;

 

Basically, defining two regexps (one for each of fname/lname), then using PRXMATCH to test if they return a match, and if so, then using PRXPOSN to return the value from the parentheses (capture buffer).

 

I'm assuming in the latter that this is a JSON file, hence the quotation marks; you'll need to make the right regexp for your data here (and I didn't test this, so my PRX may be a bit rusty, but if it doesn't work feel free to come back with your code.)

rcsutherland
Fluorite | Level 6

Thank you all! FILEVAR is so powerful!

 

My only problem now is how to search for the required string on each text files customer.

 

Here's what the customer file looks like:

 

ID:   '100001'

Firstname:  'Lonzo'

Lastname:   'Ball'

There are 3 spaces between the labels and the values. Input dataset is same as mentioned in my previous post.

 

Here's my code so far:

 

libname temp "<path>";
%let tmp = <path>;

data temp._output;
 set temp._input;
    length path f2r $500.;
    path = symget('tmp')
    f2r = cats(path,file);
run;

OUTPUT:

Firstname           Lastname               File                                   Path                        F2R

                                                        cus_01.txt                        <path>                    <path>/cus_01.txt

                                                        cus_02.txt                        <path>                    <path>/cus_02.txt

 

Tom
Super User Tom
Super User

Here's what the customer file looks like:

 

ID:   '100001'

Firstname:  'Lonzo'

Lastname:   'Ball'

There are 3 spaces between the labels and the values. Input dataset is same as mentioned in my previous post.

 

Are your files really that simple?

Why not just use named input?

data test;
   length id $10 firstname lastname $30 ;
   input @'ID:' id @'Firstname:' firstname @'Lastname:' lastname ;
cards;
ID:   '100001'
Firstname:  'Lonzo'
Lastname:   'Ball'
;

If you really want to use GREP then you can use FILEVAR= option with the PIPE engine.

 

 

rcsutherland
Fluorite | Level 6

I cannot use cards. My customer files are really like that.

 

I'm no forcing to use grep, I'm just asking what other search function I could use when INFILE opened my textfiles during runtime. I need to search to get the string beside "Firstname" and "Lastname"

ErikLund_Jensen
Rhodochrosite | Level 12

Hi rcsutherland

 

I don't know what your customer files actually look like, but I suppose you have a good reason for wanting to extract name lines using GREP instead of reading the files and extracting the names in SAS.

 

It is not easy to get your code to work as intended. So I suggest another approach. It is just an example to show the idea, and the grep result is probably more difficult to read in real life.

 

Test input istwo files in folder /sasdata/user/sasbatch/erlu names cus_01 and cus_o2 with content like this:

 

Firstname Svend
Lastname Olsen

 

First step is to use grep to extract all first and last names from all files in the directory. The output from grep is 4 lines:

 

/sasdata/user/sasbatch/erlu/cus_01:Firstname Svend
/sasdata/user/sasbatch/erlu/cus_01:Lastname Olsen
/sasdata/user/sasbatch/erlu/cus_02:Firstname Erik
/sasdata/user/sasbatch/erlu/cus_02:Lastname Larsen

 

filename lfname pipe "grep 'Firstname\|Lastname' /sasdata/user/sasbatch/erlu/*";
data work.grepresult;
	length filename $100 type $10 name $60;
	infile lfname;
	input;
	filename = scan(_infile_,1,':');
	type = scan(_infile_,2,': ');
	name = scan(_infile_,3,': ');
run;

Data from this step:

 

fromgrep.gif

 

Next step is to make one record pr. file with Firstname and lastname. I use a full join to handle missing values in a simple way.

 

proc sql;
	create table work.custnames as
		select 
			a.name as Firstname,
			b.name as Lastname,
			coalesce(a.filename,b.filename) as filename
	from work.grepresult (where=(type = 'Firstname')) as a
	full outer join work.grepresult (where=(type = 'Lastname')) as b
	on a.filename = b.filename;
quit;

The result looks like your test table with names filled in:

 

result.gif

 

 

 

If you actual data has other variables, You could join on filename.

rcsutherland
Fluorite | Level 6
Hi ErikLund_Jensen,

Thank you very much for your input. I will use filevar because it's more easier. 🙂

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 12 replies
  • 1877 views
  • 4 likes
  • 6 in conversation