SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Use of Substr or Scan

Accepted Solution Solved
Reply
Frequent Contributor
Posts: 89
Accepted Solution

Use of Substr or Scan

Hi I need some help. I have the variable like site with values

Site

CT-BENGALURU-J.P.NAGAR

CT-BENGALURU-ORION MALL

CT-BENGALURU-SOUL SPACE SPIRIT

CT-MUMBAI-GOREGAON-OBEROI MALL

CT-PUNE-ASCENT MALL

CT-PUNE-MSM PARANJAPE MALL-KARVE ROAD

I want output like 

J.P.NAGAR

ORION MALL

SOUL SPACE SPIRIT

OBEROI MALL

ASCENT MALL

PARANJAPE MALL

 


Accepted Solutions
Solution
‎01-21-2016 07:16 AM
Super User
Posts: 9,682

Re: Use of Substr or Scan

data have;
input Site $80.;
length want $ 80;
pid=prxparse('/\w+\s+MALL/i');
if prxmatch(pid,site) then do;
 call prxsubstr(pid,site,p,l);
 want=substr(site,p,l);
 end;
 else want=scan(site,-1,'-');
drop p l;
cards;
CT-BENGALURU-J.P.NAGAR
CT-BENGALURU-ORION MALL
CT-BENGALURU-SOUL SPACE SPIRIT
CT-MUMBAI-GOREGAON-OBEROI MALL
CT-PUNE-ASCENT MALL
CT-PUNE-MSM PARANJAPE MALL-KARVE ROAD
;
run;

View solution in original post


All Replies
Super User
Posts: 6,946

Re: Use of Substr or Scan

I can't see a consistent rule here.

Usually it's the last "word" of those separatd by hyphens, but in the last line it is the second (and not even the whole second) and not the last. Without a consistent rule, no algorithm can be built.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Frequent Contributor
Posts: 89

Re: Use of Substr or Scan

Actually there is data inconsistency.

Super User
Posts: 5,257

Re: Use of Substr or Scan

Looks like addresses, right?

Like @KurtBremser said, there seems not be a clear rule.

That leaves us to do data standardization.

And I suspect that addresses are available in the SA Data Quality Base, India edition, contained in the data flux product(s).

Data never sleeps
Super User
Posts: 5,085

Re: Use of Substr or Scan

Here are some statements that approximate what you are trying to do:

 

new_site = scan(site, 3, '-');

 

new_site = scan(site, -1, '-');

 

The reason I say "approximate" is because you have rules in your head that are not part of the program.  For example, consider:

 

CT-PUNE-MSM PARANJAPE MALL-KARVE ROAD

 

Why should the result be PARANJAPE MALL instead of MSM PARANJAPE MALL?  You have some rules about that, but all of your rules have to be made known in order to incorporate them into the program.

 

Good luck.

 

Solution
‎01-21-2016 07:16 AM
Super User
Posts: 9,682

Re: Use of Substr or Scan

data have;
input Site $80.;
length want $ 80;
pid=prxparse('/\w+\s+MALL/i');
if prxmatch(pid,site) then do;
 call prxsubstr(pid,site,p,l);
 want=substr(site,p,l);
 end;
 else want=scan(site,-1,'-');
drop p l;
cards;
CT-BENGALURU-J.P.NAGAR
CT-BENGALURU-ORION MALL
CT-BENGALURU-SOUL SPACE SPIRIT
CT-MUMBAI-GOREGAON-OBEROI MALL
CT-PUNE-ASCENT MALL
CT-PUNE-MSM PARANJAPE MALL-KARVE ROAD
;
run;
Frequent Contributor
Posts: 89

Re: Use of Substr or Scan

Thanks a lot.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 483 views
  • 2 likes
  • 5 in conversation