BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
svh
Lapis Lazuli | Level 10 svh
Lapis Lazuli | Level 10

Hello,

 

I have run into a programming problem and cannot seem to find the answer in my available resources. In general, I am needing to read in data from Excel files, and one variable is an identifier of a metric for my organization (such as the graduation rate of students in our engineering college). In the example below (which is not how I read the data), the variable X has a color and year at the beginning of each value. I need to be able to read the words AFTER the year into a new variable called "Metric". 

 

 

DATA TEST;
INPUT X $ 1-27 ;
CARDS;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;

Currently, I am using some code like this to parse the string, but this works only because I have looked up the number of words for each possible value of X. This is a snippet from my Data Step ("Type" is a different variable that I pre-pend to the new variable "metric". In the first "if" statement, X is three words, and the "metric" variable only needs the last word in the string.

 

 

if countw(X, " ") = 3 then do;
			metric = (&TYPE)||' '||SCAN(X, -1, " ");
		end;
if countw(X, " ") = 5 then do;
			metric = (&TYPE)||' '||SCAN(X, -3, " ")||' '||SCAN(X, -2, " ")||' '||SCAN(X, -1, " ") ;
		end;

It seems like there should be a way to write a loop that automatically finds the length of X and then creates "metric" based on the length of that string. Can someone point me to some resources about loops that may help with this problem?

 

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
snoopy369
Barite | Level 11

 CALL SCAN is what you want here.  Instead of just returning Nth word in the string, it returns the position (and length, which you will disregard) of the Nth word.  Then you use SUBSTR.

 


DATA TEST;
INPUT X $ 1-27 ;
CARDS;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;

data want;
  set test;
  length position length 8;
  call scan(x, 3, position, length);
  new_x = substr(x,position);
run;

View solution in original post

7 REPLIES 7
snoopy369
Barite | Level 11

 CALL SCAN is what you want here.  Instead of just returning Nth word in the string, it returns the position (and length, which you will disregard) of the Nth word.  Then you use SUBSTR.

 


DATA TEST;
INPUT X $ 1-27 ;
CARDS;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;

data want;
  set test;
  length position length 8;
  call scan(x, 3, position, length);
  new_x = substr(x,position);
run;
ballardw
Super User

Just to be clear is the example rows of data you show as entered in a single cell in Excel?

If that is the case then we a approaching a new low in allowing people to enter data in Excel.

If the data is in separate cells this is trivial.

 

How many different values of "year" do you have? Unless you have other numerals that appear before the year and your years include values before 2000 (content is important after all);

 

DATA TEST;
INPUT X $ 1-27 ;
 metric = substr(x,index(x,'2')+5);   
CARDS;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;

uses the position of the first 2 in the data and calculates and offset to start substring extract.

 

 

You do not mention what "&type" might contain, where it comes from or why it is in the code. So I can only show how to extract.

svh
Lapis Lazuli | Level 10 svh
Lapis Lazuli | Level 10

Thank you for suggesting a solution. Yes, the variable "x" is what is in one cell in the Excel file. (The people who use the file use the VLOOKUP function). There are currently only 5-6 values of year, but it will keep changing after each yearly cycle. 

 

&Type is just a categorical variable to help identify a class of metrics (such as "Enrollment" or "Expenses").

kiranv_
Rhodochrosite | Level 12

another way

data TEST;
 length  x $50. val $30.;
retain pattern;
 IF _N_ =1 THEN PATTERN = PRXPARSE ("/([^0-9]+$)/"); 
input x $50.;
if prxmatch(pattern, x) then 
 call prxposn(pattern, 1, position, length);
 val = substr(x,position, length);
drop pattern position length; 
datalines;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;
snoopy369
Barite | Level 11

I think I prefer the following if you're going the PRX route, given the specifications in the question.  There's not necessarily a guarantee no numerals will be in the third set of words, so it's easier to look for the word and the numbers first I think:

 

 


data want;
  set test;
  length val $50;
  PATTERN = prxparse("~.*?\d+(.*$)~io");
  
  if prxmatch(pattern, x) then 
    call prxposn(pattern, 1, position, length);
  val = substr(x,position, length);
run;
novinosrin
Tourmaline | Level 20
/*if reading a raw(external file)*/

DATA TEST;
INPUT X $ 1-27 @;
input @(anydigit(_infile_)+5) new_x  $30.;
CARDS;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;
Ksharp
Super User

You could use three different input method: list input and formatted input  .

 

DATA TEST;
infile cards truncover;
INPUT color : $20. year Metric $80. ;
CARDS;
APPLE 2016 SOLD ONLINE
ORANGE 2016 NON-TENURE ON TRACK
PINK 2016 CONTINUING
YELLOW 2016 TEMPORARY
;
RUN;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2282 views
  • 4 likes
  • 6 in conversation