BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
LEINAARE
Obsidian | Level 7

Hello,

 

I would like to extract qualitative responses after encountering a comma, following Yes/No responses and create two variables, one containing yes/no responses, and the other containing qualitative responses.  Below is an example of the data as it is now, and what I would like.

 

Data as it is now:

 

ID     Var1

1      Yes. The training was great!

2      No, it was not helpful.

3      Yes

4      YES, the part about patient-centered counseling

5       No, see my previous comment

6       Yes, Amber was wonderful!!!

 

What I would like:

 

ID    Var1      Var2

1      Yes       The training was great!

2      No         it was not helpful.

3      Yes

4      Yes        the part about patient-centered counseling

5      No         see my previous comment

6      Yes        Amber was wonderful!!!

 

Most observations have Yes/No only, or are blank.  Only about 10% have qualitative responses included.  This was a data entry error on the part of student assistants, but it occurs fairly often across different datasets and variables. I have used the SCAN function to identify observations that need modification, but would appreciate help creating a new variable to take the qualitative value (maintaining case and punctuation as it appears).

 

data want;
   set have;
   if scan(upcase(VAR1,1)) in ('YES','NO') then VAR2 = 'Modify';
   /*I used VAR2 as a test to see if my use of the SCAN function worked to identify observations needing modification*/
run;

 

I would like help assigning the qualitative values to VAR2 as they appear after the comma in VAR1, stripping the leading space, and leaving VAR1 with only the vale of Yes/No.

 

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Try this:

data want;
set have;
if countw(var1,' .,') > 1
then do;
  index = indexc(var1,' ,.');
  var2 = substr(var1,index+1);
  var1 = substr(var1,1,index-1);
end;
else if upcase(var1) in ('YES','NO') then var2 = 'Modify';
drop index;
run;

View solution in original post

3 REPLIES 3
Kurt_Bremser
Super User

Try this:

data want;
set have;
if countw(var1,' .,') > 1
then do;
  index = indexc(var1,' ,.');
  var2 = substr(var1,index+1);
  var1 = substr(var1,1,index-1);
end;
else if upcase(var1) in ('YES','NO') then var2 = 'Modify';
drop index;
run;
mkeintz
PROC Star

Use the INDEXW function to find the character position where the second word in VAR1 resides.  Then use that as the starting position in a SUBSTR function, as in:

 

data have;
input ID     oldVar1 :&$30.;
datalines;
1      Yes. The training was great!
2      No, it was not helpful.
3      Yes
4      YES, the part about patient-centered counseling
5      No, see my previous comment
6      Yes, Amber was wonderful!!!
run;

data want;
  set have;
  var1=scan(oldvar1,1);
  c=indexw(oldvar1,scan(oldvar1,2));
  if c>0 then var2=substr(oldvar1,c);
run;

The leading "&" in the informat used in the INPUT statement tells sas not to allow a single blank to terminate the incoming character value (but a double blanks would terminate it).

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
LEINAARE
Obsidian | Level 7

Thank you @Kurt_Bremser@mkeintz,

 

The code that both of you sent worked to solve the issue.  I accepted @Kurt_Bremser because he posted his solution first.  Thank you both again.  This will give me some new code to experiment with.

 

Ted