Solved: Using SCAN function to identify and extract character strings after de...

LEINAARE · Posted 01-22-2020 11:18 AM

Hello,

I would like to extract qualitative responses after encountering a comma, following Yes/No responses and create two variables, one containing yes/no responses, and the other containing qualitative responses. Below is an example of the data as it is now, and what I would like.

Data as it is now:

ID Var1

1 Yes. The training was great!

2 No, it was not helpful.

3 Yes

4 YES, the part about patient-centered counseling

5 No, see my previous comment

6 Yes, Amber was wonderful!!!

What I would like:

ID Var1 Var2

1 Yes The training was great!

2 No it was not helpful.

3 Yes

4 Yes the part about patient-centered counseling

5 No see my previous comment

6 Yes Amber was wonderful!!!

Most observations have Yes/No only, or are blank. Only about 10% have qualitative responses included. This was a data entry error on the part of student assistants, but it occurs fairly often across different datasets and variables. I have used the SCAN function to identify observations that need modification, but would appreciate help creating a new variable to take the qualitative value (maintaining case and punctuation as it appears).

data want;
   set have;
   if scan(upcase(VAR1,1)) in ('YES','NO') then VAR2 = 'Modify';
   /*I used VAR2 as a test to see if my use of the SCAN function worked to identify observations needing modification*/
run;

I would like help assigning the qualitative values to VAR2 as they appear after the comma in VAR1, stripping the leading space, and leaving VAR1 with only the vale of Yes/No.

Thank you!

Kurt_Bremser · Posted 01-22-2020 11:38 AM

Try this:

data want;
set have;
if countw(var1,' .,') > 1
then do;
  index = indexc(var1,' ,.');
  var2 = substr(var1,index+1);
  var1 = substr(var1,1,index-1);
end;
else if upcase(var1) in ('YES','NO') then var2 = 'Modify';
drop index;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 01-22-2020 11:38 AM

Try this:

data want;
set have;
if countw(var1,' .,') > 1
then do;
  index = indexc(var1,' ,.');
  var2 = substr(var1,index+1);
  var1 = substr(var1,1,index-1);
end;
else if upcase(var1) in ('YES','NO') then var2 = 'Modify';
drop index;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

mkeintz · Posted 01-22-2020 11:51 AM

Use the INDEXW function to find the character position where the second word in VAR1 resides. Then use that as the starting position in a SUBSTR function, as in:

data have;
input ID     oldVar1 :&$30.;
datalines;
1      Yes. The training was great!
2      No, it was not helpful.
3      Yes
4      YES, the part about patient-centered counseling
5      No, see my previous comment
6      Yes, Amber was wonderful!!!
run;

data want;
  set have;
  var1=scan(oldvar1,1);
  c=indexw(oldvar1,scan(oldvar1,2));
  if c>0 then var2=substr(oldvar1,c);
run;

The leading "&" in the informat used in the INPUT statement tells sas not to allow a single blank to terminate the incoming character value (but a double blanks would terminate it).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

LEINAARE · Posted 01-22-2020 12:06 PM

Thank you @Kurt_Bremser & @mkeintz,

The code that both of you sent worked to solve the issue. I accepted @Kurt_Bremser because he posted his solution first. Thank you both again. This will give me some new code to experiment with.

Ted

Using SCAN function to identify and extract character strings after delimiter

Re: Using SCAN function to identify and extract character strings after delimiter

Re: Using SCAN function to identify and extract character strings after delimiter

Re: Using SCAN function to identify and extract character strings after delimiter

Re: Using SCAN function to identify and extract character strings after delimiter

Catch up on SAS Innovate 2026