BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ksuslik
Fluorite | Level 6

Hello,

I have a SAS dataset with the following fields:

1. Questions: character $1024; it contains questions that an interviewer asks interviewee. For example, an interviewer "Hey, John, how do you deal with stress?" 

2. FirstName: character $200; "John         "

 

The dataset consists of thousands of lines of questions and potential first names that can be used in these questions. I want my program to figure out which Questions contains reference to FirstName.

 

I tried using findw function, but it is not finding anything, I think because my data for names has so many characters after the name.

 

Thank you so much for any suggestions! 

1 ACCEPTED SOLUTION

Accepted Solutions
PaigeMiller
Diamond | Level 26

@Ksuslik wrote:

how do I make sure it ignores lower/upper case?


This has not been mentioned before.

 

The FINDW function has lots of options, one of which does exactly what you want. Always good to check the documentation to see if there is an option that does what you want. You can find the FINDW documentation here: https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=lefunctionsref&docsetTarget=p...

--
Paige Miller

View solution in original post

13 REPLIES 13
ballardw
Super User

@Ksuslik wrote:

Hello,

I have a SAS dataset with the following fields:

1. Questions: character $1024; it contains questions that an interviewer asks interviewee. For example, an interviewer "Hey, John, how do you deal with stress?" 

2. FirstName: character $200; "John         "

 

The dataset consists of thousands of lines of questions and potential first names that can be used in these questions. I want my program to figure out which Questions contains reference to FirstName.

 

I tried using findw function, but it is not finding anything, I think because my data for names has so many characters after the name.

 

Thank you so much for any suggestions! 


If this was from some sort of survey software, the most common way I know to create garbage like this, then you may want to investigate the options that created the file you are using. I would expect somewhere to be a variable that captured that value of "John" to use in the first place and make sure that is exported along with the data.

 

Names in general are one of the messiest forms of data and trying to identify a "name" from general rules is not going to be easy.

If you provide a text example of every single question that might contain the "name" it may be possible to parse out of specific pieces of text but I would suggest if at all possible avoiding such.

 

For example your "Hey, John, how do you deal with stress?" (BTW starting a question with "hey" is very bad form in general) if we identify the word "Hey" and the start (FINDW looking for word position 1) and the remainder string "how do you deal with stress?" using the Index function then it is possible to remove everything else leaving a name.

 

But if this is something like typed text from a recording and the question text varies from person to person such as "Mary, how do you try to deal with stress ?"I wish you lots of luck.

Does your data have anything resembling an personal identifier, or a number of fields that uniquely identify a person? That would make more sense to deal with and maybe link to one question that is consistent to pull a "name".

Ksuslik
Fluorite | Level 6

This is indeed survey data. I know who they interview, so I could match names of interviewees with questions. What I am trying to figure out if sas has some way of identifying text that contains specific words, such as names. Similar to regular expressions Python.

ballardw
Super User

@Ksuslik wrote:

This is indeed survey data. I know who they interview, so I could match names of interviewees with questions. What I am trying to figure out if sas has some way of identifying text that contains specific words, such as names. Similar to regular expressions Python.


SAS has regular expressions. Look up the PRXPARSE PRXMATCH PRXCHANGE PRXPAREN PRXPOSN and related functions and CALL versions.

PaigeMiller
Diamond | Level 26

FINDW should work.

 

Whenever you have code that isn't working, SHOW US the LOG. We need to see the entire DATA step where you use FINDW, as seen in the log, from the DATA command all the way down to the last NOTE: after the DATA step, all of it, 100% if it, with nothing chopped out, verbatim.

 

Please format the log properly for readability, by pasting it as text into the window that appears when you click on the </> icon. DO NOT SKIP THIS STEP.

--
Paige Miller
Ksuslik
Fluorite | Level 6

here is what I tell sas to do:

 

data b2; set b1;

mention=findw(question, firstName); run;

 

the log says that there were 1,500 observations read from b1 and b2 has 1,500 observations. Just a standard line from sas

ballardw
Super User

@Ksuslik wrote:

here is what I tell sas to do:

 

data b2; set b1;

mention=findw(question, firstName); run;

 

the log says that there were 1,500 observations read from b1 and b2 has 1,500 observations. Just a standard line from sas


And what do you expect it to do?

FINDW returns the position of the value of the variable Firstname inside the string Question. If firstname is not defined or has a missing value then the function returns 0 as nothing to find. If firstname has a value then it returns the first character position number where the firstname starts if it is present. So if Mention is > 0 then the name is found.

As shown you code would not match "JOHN" with "John" or "john".

Ksuslik
Fluorite | Level 6

how do I make sure it ignores lower/upper case?

PaigeMiller
Diamond | Level 26

@Ksuslik wrote:

how do I make sure it ignores lower/upper case?


This has not been mentioned before.

 

The FINDW function has lots of options, one of which does exactly what you want. Always good to check the documentation to see if there is an option that does what you want. You can find the FINDW documentation here: https://documentation.sas.com/?cdcId=pgmmvacdc&cdcVersion=9.4&docsetId=lefunctionsref&docsetTarget=p...

--
Paige Miller
PaigeMiller
Diamond | Level 26

@Ksuslik wrote:

here is what I tell sas to do:

 

data b2; set b1;

mention=findw(question, firstName); run;

 

the log says that there were 1,500 observations read from b1 and b2 has 1,500 observations. Just a standard line from sas


I want to see the log, the entire log for the DATA step where FINDW is used; I do not want you to select portions of the log to show me. I also want the log properly formatted (directions given previously).

--
Paige Miller
Ksuslik
Fluorite | Level 6

I found this topic that answered my question:

 

https://communities.sas.com/t5/SAS-Programming/find-words-inside-string-words-inside-array/m-p/22651...

 

I just needed to use strip function to remove the empty space after name and it works!

PaigeMiller
Diamond | Level 26

I can't see how the STRIP function will have any impact on what you have described to us, especially the part where you said

 

how do I make sure it ignores lower/upper case?

--
Paige Miller
Ksuslik
Fluorite | Level 6
It turns out the problem was not lower/upper case, but spaces that SAS puts in the character variable. My Name field was stored by SAS as 200 character variable. The strip function helped get rid of it.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 1768 views
  • 4 likes
  • 3 in conversation