DATA Step, Macro, Functions and more

I want to extract 3rd word from each line using character function other than scan function.

Reply
New Contributor
Posts: 3

I want to extract 3rd word from each line using character function other than scan function.

Hai All,

 

data a;
input details &:$50.;
cards;
my name is parthasaradhi
i am from dowlaiswaram
it is nearer to rajahmundry
;
run;

 

I want to extract 3rd word from each line using character function other than scan function.

 

new variable values is like this:

 

is

from

nearer

 

Thanks in Advance.

 

Super User
Posts: 10,209

Re: I want to extract 3rd word from each line using character function other than scan function.

"I want to extract 3rd word from each line using character function other than scan function."

 

And next you try to run without using your feet?

The scan() function is THE tool for this, use it.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
Super User
Posts: 10,209

Re: I want to extract 3rd word from each line using character function other than scan function.

If this is actually some kind of homework, then you are supposed to solve it, not we.

Hints:

Use a loop with the findc() function to find the second occurence of a blank. Then find the next occurence of a blank (keep in mind that all character variables are padded with blanks), and use substr() with the positions you determined.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
New Contributor
Posts: 3

Re: I want to extract 3rd word from each line using character function other than scan function.

Posted in reply to KurtBremser
Hai Mr. Bremser,
I am trying to do it but how to find out second occurrence of blank.

data b;
set a;
info=findc(details," ");
run;
it is showing first occurrence of blank but not second.
Help me
.
Thanks
Super User
Posts: 10,209

Re: I want to extract 3rd word from each line using character function other than scan function.

As I said, you need to do a loop to find the second occurence. One feature of findc is that you can supply a start position for the search.

start = findc(details,' ');

will find the first occurence.

start = findc(details,' ',start+1);

will find the next occurence after that. If you repeat that until you get the nth occurence (2 if you want to find the 3rd word), you have the start for your substring. How you find the end should be quite obvious now. Note that the third parameter for substr() has to be a length, not a position, so you need to do a little calculation there.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
How to post code
New Contributor
Posts: 3

Re: I want to extract 3rd word from each line using character function other than scan function.

Posted in reply to KurtBremser
Hai Kurt Bremser,
Thanks for your suggestions.
here is the code:

/*without scan function*/
data c;
set a;
info=findc(details,"",4+1);
infon=findc(details,"",9+1);
ext=substr(details,info,infon-info);
run;

Finally i got it.
Thanks for the support.

Partha.

Trusted Advisor
Posts: 1,837

Re: I want to extract 3rd word from each line using character function other than scan function.

As you do not want to use SCAN function, you can loop the line character by character, using SUBSTR function,

check for a space (or any other delimiter), count it and select all characters between the 2nd and the 3rd space/delimiter.

Super Contributor
Posts: 339

Re: I want to extract 3rd word from each line using character function other than scan function.

Hi,

 

Well done on presenting your input data in the form of a data step.

 

Is there a reason you want to avoid using the scan() function, e.g., this is a homework exercise, curiosity, you want to practice using other character manipulation functions, your boss said so (in which case ask your boss why), etc.?

 

As has already been indicated by @KurtBremser, each word is separated by spaces, so scan() would ordinarily be the way to go, so giving us the reason why you don't want to use scan() might help us understand your thinking for asking, so that we can advise accordingly.

 

 

Regards,

Amir.

Super Contributor
Posts: 339

Re: I want to extract 3rd word from each line using character function other than scan function.

[ Edited ]

Hi,

 

A less serious, but potentially still valid response, is if you're not allowed to use the scan() function, then how about using the %scan() function?

 

EDIT: Or even the scanq() function?

 

Regards,

Amir.

 

PROC Star
Posts: 260

Re: I want to extract 3rd word from each line using character function other than scan function.

Many ways to skin that cat. I would suggest using PRX (Pearl Regular Expressions), as they are worth learning:

data want;
  set a;
  prxid=prxparse('/^\s*\S+\s+\S+\s+(\S+)/');
  drop prxid;
  length thirdword $10;
  if prxmatch(prxid,details) then
    thirdword=prxposn(prxid,1,details);
run;

A short explanation:

^ Means beginning of string

\s* Means zero or more blanks (or other whitespace characters, such as tabs)

\S+ Means one or more non-blanks (non-whitespace)

\s+ Means one or more blanks

(\S+) puts the third occurence of one or more non-blanks in a capture buffer (the one and only in this case). 

PRXPOSN then retrieves the contents of the first capture buffer.

 

But sometimes you need to do stuff where it would be very handy if the variable you wanted to parse was in a file, so that you could use INPUT statements to parse it. In that case you do not have to write the whole dataset to a file and then read it, you can just use the fact that the _INFILE_ variable contains the input buffer, and it can be written to:

data want;
  infile sasautos(verify.sas) ;
  if _N_= 1 then input @@;
  set a;
  _infile_=details;
  input @1 thirdword $ thirdword $ thirdword $ @@ ;
run;

I used SASAUTOS(VERIFY.SAS) as the infile, as the file must exist, and this macro seems to exist on most SAS installations.

Super User
Posts: 10,761

Re: I want to extract 3rd word from each line using character function other than scan function.

 

data a;
input details &:$50.;
details=compbl(details);
n=0;
do i=1 to length(details);
 if char(details,i)=' ' then do;n+1;if n=2 then s=i;if n=3 then e=i;end;
end;
want=substr(details,s,e-s);
cards;
my name is parthasaradhi
i am from dowlaiswaram
it is nearer to rajahmundry
;
run;
PROC Star
Posts: 1,767

Re: I want to extract 3rd word from each line using character function other than scan function.

data a;
input details & $50.;
cards;
my name is parthasaradhi
i am from dowlaiswaram
it is nearer to rajahmundry
;
run;

data output;
set a;
k=substr(details,anyspace(details)+1);
k1=substr(k,anyspace(strip(k))+1);
want=substr(k1,1,anyspace(k1));
drop k:;
run;
Ask a Question
Discussion stats
  • 11 replies
  • 115 views
  • 5 likes
  • 7 in conversation