DATA Step, Macro, Functions and more

How to Read a List of Words in Values with Different Lengths

Accepted Solution Solved
Reply
Occasional Contributor svh
Occasional Contributor
Posts: 16
Accepted Solution

How to Read a List of Words in Values with Different Lengths

Hello,

 

I have run into a programming problem and cannot seem to find the answer in my available resources. In general, I am needing to read in data from Excel files, and one variable is an identifier of a metric for my organization (such as the graduation rate of students in our engineering college). In the example below (which is not how I read the data), the variable X has a color and year at the beginning of each value. I need to be able to read the words AFTER the year into a new variable called "Metric". 

 

 

DATA TEST;
INPUT X $ 1-27 ;
CARDS;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;

Currently, I am using some code like this to parse the string, but this works only because I have looked up the number of words for each possible value of X. This is a snippet from my Data Step ("Type" is a different variable that I pre-pend to the new variable "metric". In the first "if" statement, X is three words, and the "metric" variable only needs the last word in the string.

 

 

if countw(X, " ") = 3 then do;
			metric = (&TYPE)||' '||SCAN(X, -1, " ");
		end;
if countw(X, " ") = 5 then do;
			metric = (&TYPE)||' '||SCAN(X, -3, " ")||' '||SCAN(X, -2, " ")||' '||SCAN(X, -1, " ") ;
		end;

It seems like there should be a way to write a loop that automatically finds the length of X and then creates "metric" based on the length of that string. Can someone point me to some resources about loops that may help with this problem?

 

Thanks!


Accepted Solutions
Solution
‎02-08-2018 08:36 AM
Super Contributor
Posts: 320

Re: How to Read a List of Words in Values with Different Lengths

 CALL SCAN is what you want here.  Instead of just returning Nth word in the string, it returns the position (and length, which you will disregard) of the Nth word.  Then you use SUBSTR.

 


DATA TEST;
INPUT X $ 1-27 ;
CARDS;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;

data want;
  set test;
  length position length 8;
  call scan(x, 3, position, length);
  new_x = substr(x,position);
run;

View solution in original post


All Replies
Solution
‎02-08-2018 08:36 AM
Super Contributor
Posts: 320

Re: How to Read a List of Words in Values with Different Lengths

 CALL SCAN is what you want here.  Instead of just returning Nth word in the string, it returns the position (and length, which you will disregard) of the Nth word.  Then you use SUBSTR.

 


DATA TEST;
INPUT X $ 1-27 ;
CARDS;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;

data want;
  set test;
  length position length 8;
  call scan(x, 3, position, length);
  new_x = substr(x,position);
run;
Super User
Posts: 13,300

Re: How to Read a List of Words in Values with Different Lengths

Just to be clear is the example rows of data you show as entered in a single cell in Excel?

If that is the case then we a approaching a new low in allowing people to enter data in Excel.

If the data is in separate cells this is trivial.

 

How many different values of "year" do you have? Unless you have other numerals that appear before the year and your years include values before 2000 (content is important after all);

 

DATA TEST;
INPUT X $ 1-27 ;
 metric = substr(x,index(x,'2')+5);   
CARDS;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;

uses the position of the first 2 in the data and calculates and offset to start substring extract.

 

 

You do not mention what "&type" might contain, where it comes from or why it is in the code. So I can only show how to extract.

Occasional Contributor svh
Occasional Contributor
Posts: 16

Re: How to Read a List of Words in Values with Different Lengths

Thank you for suggesting a solution. Yes, the variable "x" is what is in one cell in the Excel file. (The people who use the file use the VLOOKUP function). There are currently only 5-6 values of year, but it will keep changing after each yearly cycle. 

 

&Type is just a categorical variable to help identify a class of metrics (such as "Enrollment" or "Expenses").

PROC Star
Posts: 503

Re: How to Read a List of Words in Values with Different Lengths

another way

data TEST;
 length  x $50. val $30.;
retain pattern;
 IF _N_ =1 THEN PATTERN = PRXPARSE ("/([^0-9]+$)/"); 
input x $50.;
if prxmatch(pattern, x) then 
 call prxposn(pattern, 1, position, length);
 val = substr(x,position, length);
drop pattern position length; 
datalines;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;
Super Contributor
Posts: 320

Re: How to Read a List of Words in Values with Different Lengths

I think I prefer the following if you're going the PRX route, given the specifications in the question.  There's not necessarily a guarantee no numerals will be in the third set of words, so it's easier to look for the word and the numbers first I think:

 

 


data want;
  set test;
  length val $50;
  PATTERN = prxparse("~.*?\d+(.*$)~io");
  
  if prxmatch(pattern, x) then 
    call prxposn(pattern, 1, position, length);
  val = substr(x,position, length);
run;
PROC Star
Posts: 1,567

Re: How to Read a List of Words in Values with Different Lengths

/*if reading a raw(external file)*/

DATA TEST;
INPUT X $ 1-27 @;
input @(anydigit(_infile_)+5) new_x  $30.;
CARDS;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;
Super User
Posts: 10,686

Re: How to Read a List of Words in Values with Different Lengths

You could use three different input method: list input and formatted input  .

 

DATA TEST;
infile cards truncover;
INPUT color : $20. year Metric $80. ;
CARDS;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 191 views
  • 4 likes
  • 6 in conversation