Solved: Re: Storing a list of values (from a dataset) in a macro variable to u... - Page 2

MR_E · Posted 01-10-2017 10:07 AM

Wow - that is impressive, as i have just mentioned to DanielSantos both approaches work great except for the char limit on the other solution, hence why i believe yours may be the answer ( no sight of macros anywhere 🙂 )

before i accept the solution , i wonder of there is limitations to the extent of my query.

i.e if we change the second keyword to later on in the text it would be an invalid answer, for example the below.

I am looking for a yellow Yacht, but if the yacht keyword was later on in the code we would still pick this up as found2 > found1 still, but if my desire was specifically Yellow Yacht, this would infact be an invalid result. ( i realise you have answered the solution im just wondering how far my query could go with horrid data)

data colour_text;

input text $char50.;

datalines;

i want a BLUE BALL please

did you see a blue car go by

i have owned a yellow SHIP but not a yacht before

id never buy a red ball

we want a turqoise car

she said it was a yellow plane toy

he drove in his green Porsche

i didnt like his black heart

he had a grey ball

i saw a white plane

;

run;

to put this in context - The issue is: The real data is actually seperating actual cities from streets ,ie i want Paris when its a city in the text, and another dataset to identify when its a street

21 rouge street Paris - correct result for city dataset

21 rouge street Paris invoice to 15 kruger street - would go street dataset as street>cityname (found2>found1)

now, one answer would be to seperate the data after one of the cities is found ,i.e city found move additional string into next field or ignore, but that would miss the street string etc (yacht car ball)

In summary , is it possible to identify only those variables where the street is immediately after the city name (or in our instance the type is right after the colour) only.

I imagine this would involve a calculation like the index value of found2 (26) is <= 6 of end of found1 (19) i.e Paris (s is 19) Street (t is 26)

given the number of possibilities this was why i was inclined to go with reg expressions, but as i have learnt your query is more viable.

Am i asking too much?

Hopefully i have not lost you with that, and i appreciate the assistance regardless, sensational solutions

DanielSantos · Posted 01-10-2017 11:21 AM

As @Tom pointed out, the correct approach to circumvent the problem would be to remove the direct references to the macro var and replace them with the symget function (the former is resolved at compile time, the latest at runtime).

This will suffice:

data want;
set Colour_text;
if prxmatch("/[a-z]*("!!strip(symget('NAMES'))!!";)( car| ball| yacht)/i",text);
run;

This will solve the problem of the yellow yacht/(not yellow) yacht and you'll be good with a prxmatch expression up to 32K in length... But, I would not use this kind of approach with a lot of data as it's probably not the best performance wise (regular expresions are quite CPU intensive)... Give it a try to see if it works well for you.

More on symget: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000202938.htm

Hope it helps.

Daniel Santos @ www.cgd.pt

RW9 · Posted 01-10-2017 11:49 AM

Yes, thats a bit more in depth. I would start with do some data cleaning on the data you have. Identify all the rules you need to apply. Split the string up into street/city parts. Why for instance does city before street matter? Its quite hard to try to visualise from whats provided. Likelihood is regex or text miner would be better at parsing string data. Set out all your data cleaning activities, implement them into a nicely structured dataset, then start processing that data. At the moment we are just going round in cricles.

MR_E · Posted 01-10-2017 12:27 PM

Not at all, you've greatly helped me understand a different approach to this.

I never expected a simple solution but alternative pov's are a great help

Much appreciated

Re: Storing a list of values (from a dataset) in a macro variable to use in a regular expression sea

Re: Storing a list of values (from a dataset) in a macro variable to use in a regular expression sea

Re: Storing a list of values (from a dataset) in a macro variable to use in a regular expression sea

Re: Storing a list of values (from a dataset) in a macro variable to use in a regular expression sea

Catch up on SAS Innovate 2026

Re: Storing a list of values (from a dataset) in a macro variable to use in a regular expression sea

Re: Storing a list of values (from a dataset) in a macro variable to use in a regular expression sea

Re: Storing a list of values (from a dataset) in a macro variable to use in a regular expression sea

Re: Storing a list of values (from a dataset) in a macro variable to use in a regular expression sea

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away