BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
MR_E
Obsidian | Level 7

Wow - that is impressive, as i have just mentioned to DanielSantos both approaches work great except for the char limit on the other solution, hence why i believe yours may be the answer ( no sight of macros anywhere 🙂 )

 

before i accept the solution , i wonder of there is limitations to the extent of my query.

 

i.e if we change the second keyword to later on in the text it would be an invalid answer, for example the below.

I am looking for a yellow Yacht, but if the yacht keyword was later on in the code we would still pick this up as found2 > found1 still, but if my desire was specifically Yellow Yacht, this would infact be an invalid result. ( i realise you have answered the solution im just wondering how far my query could go with horrid data)

 

data colour_text;

input text $char50.;

datalines;

i want a BLUE BALL please

did you see a blue car go by

i have owned a yellow SHIP but not a yacht before

id never buy a red ball

we want a turqoise car

she said it was a yellow plane toy

he drove in his green Porsche

i didnt like his black heart

he had a grey ball

i saw a white plane

;

run;

 

to put this in context - The issue is: The real data is actually seperating actual cities from streets ,ie i want Paris when its a city in the text, and another dataset to identify when its a street

 

21 rouge street Paris - correct result for city dataset

21 rouge street Paris invoice to 15 kruger street - would go street dataset as street>cityname (found2>found1)

 

now, one answer would be to seperate the data after one of the cities is found ,i.e city found move additional string into next field or ignore, but that would miss the street string etc (yacht car ball)

 

In summary , is it possible to identify only those variables where the street is immediately after the city name (or in our instance the type is right after the colour) only.

 

I imagine this would involve a calculation like the index value of found2 (26) is <= 6 of end of found1 (19) i.e Paris (s is 19) Street (t is 26)

 

given the number of possibilities this was why i was inclined to go with reg expressions, but as i have learnt your query is more viable.

 

Am i asking too much?

 

Hopefully i have not lost you with that, and i appreciate the assistance regardless, sensational solutions

 

DanielSantos
Barite | Level 11

 

As @Tom pointed out, the correct approach to circumvent the problem would be to remove the direct references to the macro var and replace them with the symget function (the former is resolved at compile time, the latest at runtime).

 

This will suffice:

 

data want;
     set Colour_text;
     if prxmatch("/[a-z]*("!!strip(symget('NAMES'))!!";)( car| ball| yacht)/i",text);
run;

 

This will solve the problem of the yellow yacht/(not yellow) yacht and you'll be good with a prxmatch expression up to 32K in length... But, I would not use this kind of approach with a lot of data as it's probably not the best performance wise (regular expresions are quite CPU intensive)... Give it a try to see if it works well for you.

 

More on symget: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202938.htm

 

Hope it helps.

 

Daniel Santos @ www.cgd.pt

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, thats a bit more in depth.  I would start with do some data cleaning on the data you have.  Identify all the rules you need to apply.  Split the string up into street/city parts.  Why for instance does city before street matter?  Its quite hard to try to visualise from whats provided.  Likelihood is regex or text miner would be better at parsing string data.  Set out all your data cleaning activities, implement them into a nicely structured dataset, then start processing that data.  At the moment we are just going round in cricles.

MR_E
Obsidian | Level 7
Not at all, you've greatly helped me understand a different approach to this.

I never expected a simple solution but alternative pov's are a great help

Much appreciated

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 18 replies
  • 1842 views
  • 8 likes
  • 4 in conversation