Good morning
I have another silly noob questions for you
I have a variable that contains a long text like
"Mary had 50 lambs"
"Julie had 100 lambs"
"Laura has 75 lambs"
I would like to create a computed numerical column that retrieves the number contained in the text string and will return the value 50 or 100 or 75
I only need to retrieve a list of 7 different numbers (10 , 20 , 50 , 75, 100, 200 , 300)
what are the functions or the procedures to use ?
I have tried FINDC but it does not do what I need
many thanks for your help
Have a look at this post (the one right before yours):
The function you want is compress() with a couple of options 'kd'.
Hi thanks for the quick reply
I forgot to mention that I might have other numbers in the string
that s why I thought it would be better to match what I was looking for based on a list of matching numbers
I was wondering if and how I could use PRXMATCH or PRXCHANGE
Well, not sure about computed columns. In actual code, yes you could do, the prx would be:
"([0-9]+)"
@Rickyboy wrote:
Hi thanks for the quick reply
I forgot to mention that I might have other numbers in the string
that s why I thought it would be better to match what I was looking for based on a list of matching numbers
I was wondering if and how I could use PRXMATCH or PRXCHANGE
The you have to provide a rule for either which number you want extracted or if you want all of them then what the output looks like.
As in show the structure of the desired output data set.
Really best is to show some example input data and the desired output for that. Make sure your example data includes extreme values such as 1) no number at all, 2) exactly one number, 3) the largest number of numbers you expect to see. If you do not know the largest number of numbers expected then that is also important (and likely makes your data set structure a very good question in light of that info)
The data I have is a string containing some numbers, and the numbers can be anywhere.
I am interested in only a few of these numbers , for example 50 , 75 , 100 , and this is why I m not sure the compress function will work
essentially I am interested in finding this list of numbers (50, 75, 100) and creating a column that contains it, (I can then convert the column to numerical from text)
Mary has 50 lambs, and 23 rabbits List i m interested in 50 , 75, 100 output 50
Julie has 20 goats and 75 lambs 75
Michelle has 100 lambs and 10 dogs 100
Grace has 50 goats and 13 lambs 50
I hoper this makes sense
Well (again not sure about computed fields) in code:
data want (drop=i); set have; length wrd $100; do i=1 to countw(thestring," "); wrd=scan(thestring,i," "); if lengthn(compress(wrd," ","kd")) > 0 then output; end; run;
Hi
I run the code but I get two rows of new output per original row when the string contains 2 number
so in effect the output I get is :
50
23
20
75
100
10
50
13
when all I need is just the numbers contained in a list (50 , 75, 100)
thanks for taking time to help me
Lets start again. Post test data - in the form of a dataset - using the code window which is the {i} above the post area. Follow this post if you are unsure:
Next, post example required output based on that test data.
Then we will have something definite to work to.
name | string | Desired output |
rick | mary had 50 lambs and 10 geese | 50 |
mark | gary had 20 lambs and 75 geese | 75 |
john | fred had 0.50 fish | . |
steve | mark had 100 cats | 100 |
nick | paul had no dogs | . |
basically I only want to find 50 , 75 and 100 (3 possible cases) from the string (I have no interest for the 10 , 20 and 0.50)
That is not a datastep, I am not here to type in that data, or get it looking as it is on your machine.
Now in your example, how am I to know that you do not want 10 or 20, is it that you want the maximum number in the sentance?
You can get the maximum out using a slight modification of the code I presented earlier:
data want (drop=i); set have; length wrd $100; do i=1 to countw(thestring," "); wrd=scan(thestring,i," "); if lengthn(compress(wrd," ","kd")) > 0 then do;
if input(wrd,best.) > max_num then max_num=input(wrd,best.);
end; end; run;
This will for each line take any number (which is all numeric separated by spaces) and keep the maximum from each it finds. Note I can't test this as have no test data to use.
Sorry I am a noob
no I don't want the maximum number , just the number that matches from a list of my numbers
which in this case is 50 , 75 and 100
If I had 200 I would still want to discard it
I thought you could copy and paste the data easily and I don't know what a data step is
A datastep is something which begins data and ends run; which I am posting. Maybe this isn't the solution for you then. This computed thing leads me to believe you are using Visual Analytics or something like that yes. If so then its perhaps not a straight forward answer. Computed columns only work as part of a select clause in SQL, you could try (and this is pushing it):
catx(',',ifc(findw(yourstring,"50"," "),"50",""),ifc(findw(yourstring,"75"," "),"75",""),ifc(findw(yourstring,"100"," "),"100",""))
This basically does an if statement for each of 50,75,100, and if it finds it in the string, then puts it back for the catx() to concat all results into a comma delimited list.
I m using enterprise guide
Thanks for your help
in the end I had to use excel
I created a summary statistics of all the 1200 possible cases I had and for each one I created the desired response value I wanted in a separate excel column
I then did a left join with the original dataset
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.