Hello,
I would like to extract qualitative responses after encountering a comma, following Yes/No responses and create two variables, one containing yes/no responses, and the other containing qualitative responses. Below is an example of the data as it is now, and what I would like.
Data as it is now:
ID Var1
1 Yes. The training was great!
2 No, it was not helpful.
3 Yes
4 YES, the part about patient-centered counseling
5 No, see my previous comment
6 Yes, Amber was wonderful!!!
What I would like:
ID Var1 Var2
1 Yes The training was great!
2 No it was not helpful.
3 Yes
4 Yes the part about patient-centered counseling
5 No see my previous comment
6 Yes Amber was wonderful!!!
Most observations have Yes/No only, or are blank. Only about 10% have qualitative responses included. This was a data entry error on the part of student assistants, but it occurs fairly often across different datasets and variables. I have used the SCAN function to identify observations that need modification, but would appreciate help creating a new variable to take the qualitative value (maintaining case and punctuation as it appears).
data want; set have; if scan(upcase(VAR1,1)) in ('YES','NO') then VAR2 = 'Modify'; /*I used VAR2 as a test to see if my use of the SCAN function worked to identify observations needing modification*/ run;
I would like help assigning the qualitative values to VAR2 as they appear after the comma in VAR1, stripping the leading space, and leaving VAR1 with only the vale of Yes/No.
Thank you!
Try this:
data want;
set have;
if countw(var1,' .,') > 1
then do;
index = indexc(var1,' ,.');
var2 = substr(var1,index+1);
var1 = substr(var1,1,index-1);
end;
else if upcase(var1) in ('YES','NO') then var2 = 'Modify';
drop index;
run;
Try this:
data want;
set have;
if countw(var1,' .,') > 1
then do;
index = indexc(var1,' ,.');
var2 = substr(var1,index+1);
var1 = substr(var1,1,index-1);
end;
else if upcase(var1) in ('YES','NO') then var2 = 'Modify';
drop index;
run;
Use the INDEXW function to find the character position where the second word in VAR1 resides. Then use that as the starting position in a SUBSTR function, as in:
data have;
input ID oldVar1 :&$30.;
datalines;
1 Yes. The training was great!
2 No, it was not helpful.
3 Yes
4 YES, the part about patient-centered counseling
5 No, see my previous comment
6 Yes, Amber was wonderful!!!
run;
data want;
set have;
var1=scan(oldvar1,1);
c=indexw(oldvar1,scan(oldvar1,2));
if c>0 then var2=substr(oldvar1,c);
run;
The leading "&" in the informat used in the INPUT statement tells sas not to allow a single blank to terminate the incoming character value (but a double blanks would terminate it).
Thank you @Kurt_Bremser & @mkeintz,
The code that both of you sent worked to solve the issue. I accepted @Kurt_Bremser because he posted his solution first. Thank you both again. This will give me some new code to experiment with.
Ted
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.