BookmarkSubscribeRSS Feed
Rickyboy
Calcite | Level 5

Good morning

 

I have another silly noob questions for you

 

I have a variable that contains a long text like

"Mary had 50 lambs"

"Julie had 100 lambs"

"Laura has 75 lambs"

 

I would like to create a computed numerical column that retrieves the number contained in the text string and will return the value 50 or 100 or 75

 

I only need to retrieve a list of 7 different numbers (10 , 20 , 50 , 75, 100, 200 , 300)

 

what are the functions or the procedures to use ?

I have tried FINDC but it does not do what I need

 

many thanks for your help

14 REPLIES 14
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Have a look at this post (the one right before yours):

https://communities.sas.com/t5/Base-SAS-Programming/How-to-keep-only-digits-when-data-contents-numer...

 

The function you want is compress() with a couple of options 'kd'.

Rickyboy
Calcite | Level 5

Hi thanks for the quick reply

 

I forgot to mention that I might have other numbers in the string

that s why I thought it would be better to match what I was looking for based on a list of matching numbers

 

I was wondering if and how I could use PRXMATCH or PRXCHANGE

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well, not sure about computed columns.  In actual code, yes you could do, the prx would be:

"([0-9]+)"
ballardw
Super User

@Rickyboy wrote:

Hi thanks for the quick reply

 

I forgot to mention that I might have other numbers in the string

that s why I thought it would be better to match what I was looking for based on a list of matching numbers

 

I was wondering if and how I could use PRXMATCH or PRXCHANGE


The you have to provide a rule for either which number you want extracted or if you want all of them then what the output looks like.

As in show the structure of the desired output data set.

 

Really best is to show some example input data and the desired output for that. Make sure your example data includes extreme values such as 1) no number at all, 2) exactly one number, 3) the largest number of numbers you expect to see. If you do not know the largest number of numbers expected then that is also important (and likely makes your data set structure a very good question in light of that info)

Rickyboy
Calcite | Level 5

The data I have is a string containing some numbers, and the numbers can be anywhere.

I am interested in only a few of these numbers , for example 50 , 75 , 100 , and this is why I m not sure the compress function will work

 

essentially I am interested in finding this list of numbers (50, 75, 100) and creating a column that contains it, (I can then convert the column to numerical from text)

 

Mary has 50 lambs, and 23 rabbits                 List i m interested in 50 , 75, 100   output       50

Julie has 20 goats and 75 lambs                                                                                             75

Michelle has 100 lambs and 10 dogs                                                                                      100

Grace has 50 goats and 13 lambs                                                                                           50

 

I hoper this makes sense

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Well (again not sure about computed fields) in code:

data want (drop=i);
  set have;
  length wrd $100;
  do i=1 to countw(thestring," ");
    wrd=scan(thestring,i," ");
    if lengthn(compress(wrd," ","kd")) > 0 then output;
  end;
run;
Rickyboy
Calcite | Level 5

Hi

 

I run the code but I get two rows of new output per original row when the string contains 2 number

so in effect the output I get is :

 

50

23

20

75

100

10

50

13

 

when all I need is just the numbers contained in a list (50 , 75, 100)

 

thanks for taking time to help me

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Lets start again.  Post test data - in the form of a dataset - using the code window which is the {i} above the post area.  Follow this post if you are unsure:

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

 

Next, post example required output based on that test data.  

 

Then we will have something definite to work to.

Rickyboy
Calcite | Level 5
namestringDesired output
rickmary had 50 lambs and 10 geese50
markgary had 20 lambs and 75 geese75
johnfred had 0.50 fish.
stevemark had 100 cats100
nickpaul had no dogs.

 

basically I only want to find 50 , 75 and 100 (3 possible cases) from the string (I have no interest for the 10 , 20 and 0.50)

RW9
Diamond | Level 26 RW9
Diamond | Level 26

That is not a datastep, I am not here to type in that data, or get it looking as it is on your machine.

 

Now in your example, how am I to know that you do not want 10 or 20, is it that you want the maximum number in the sentance?

 

You can get the maximum out using a slight modification of the code I presented earlier:

data want (drop=i);
  set have;
  length wrd $100;
  do i=1 to countw(thestring," ");
    wrd=scan(thestring,i," ");
    if lengthn(compress(wrd," ","kd")) > 0 then do;
if input(wrd,best.) > max_num then max_num=input(wrd,best.);
end; end; run;

This will for each line take any number (which is all numeric separated by spaces) and keep the maximum from each it finds.  Note I can't test this as have no test data to use. 

Rickyboy
Calcite | Level 5

Sorry I am a noob

 

no I don't want the maximum number , just the number that matches from a list of my numbers

which in this case is 50 , 75 and 100

 

If I had 200 I would still want to discard it

 

I thought you could copy and paste the data easily and I don't know what a data step is

RW9
Diamond | Level 26 RW9
Diamond | Level 26

A datastep is something which begins data and ends run; which I am posting.  Maybe this isn't the solution for you then.  This computed thing leads me to believe you are using Visual Analytics or something like that yes.  If so then its perhaps not a straight forward answer.  Computed columns only work as part of a select clause in SQL, you could try (and this is pushing it):

catx(',',ifc(findw(yourstring,"50"," "),"50",""),ifc(findw(yourstring,"75"," "),"75",""),ifc(findw(yourstring,"100"," "),"100",""))

This basically does an if statement for each of 50,75,100, and if it finds it in the string, then puts it back for the catx() to concat all results into a comma delimited list.

Rickyboy
Calcite | Level 5

@RW9

 

Thanks for your help

 

in the end I had to use excel

I created a summary statistics of all the 1200 possible cases I had and for each one I created the desired response value I wanted in a separate excel column

I then did a left join with the original dataset

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 2877 views
  • 0 likes
  • 3 in conversation