Hello,
I would like to extract qualitative responses after encountering a comma, following Yes/No responses and create two variables, one containing yes/no responses, and the other containing qualitative responses. Below is an example of the data as it is now, and what I would like.
Data as it is now:
ID Var1
1 Yes. The training was great!
2 No, it was not helpful.
3 Yes
4 YES, the part about patient-centered counseling
5 No, see my previous comment
6 Yes, Amber was wonderful!!!
What I would like:
ID Var1 Var2
1 Yes The training was great!
2 No it was not helpful.
3 Yes
4 Yes the part about patient-centered counseling
5 No see my previous comment
6 Yes Amber was wonderful!!!
Most observations have Yes/No only, or are blank. Only about 10% have qualitative responses included. This was a data entry error on the part of student assistants, but it occurs fairly often across different datasets and variables. I have used the SCAN function to identify observations that need modification, but would appreciate help creating a new variable to take the qualitative value (maintaining case and punctuation as it appears).
data want;
set have;
if scan(upcase(VAR1,1)) in ('YES','NO') then VAR2 = 'Modify';
/*I used VAR2 as a test to see if my use of the SCAN function worked to identify observations needing modification*/
run;
I would like help assigning the qualitative values to VAR2 as they appear after the comma in VAR1, stripping the leading space, and leaving VAR1 with only the vale of Yes/No.
Thank you!
Try this:
data want;
set have;
if countw(var1,' .,') > 1
then do;
index = indexc(var1,' ,.');
var2 = substr(var1,index+1);
var1 = substr(var1,1,index-1);
end;
else if upcase(var1) in ('YES','NO') then var2 = 'Modify';
drop index;
run;
Try this:
data want;
set have;
if countw(var1,' .,') > 1
then do;
index = indexc(var1,' ,.');
var2 = substr(var1,index+1);
var1 = substr(var1,1,index-1);
end;
else if upcase(var1) in ('YES','NO') then var2 = 'Modify';
drop index;
run;
Use the INDEXW function to find the character position where the second word in VAR1 resides. Then use that as the starting position in a SUBSTR function, as in:
data have;
input ID oldVar1 :&$30.;
datalines;
1 Yes. The training was great!
2 No, it was not helpful.
3 Yes
4 YES, the part about patient-centered counseling
5 No, see my previous comment
6 Yes, Amber was wonderful!!!
run;
data want;
set have;
var1=scan(oldvar1,1);
c=indexw(oldvar1,scan(oldvar1,2));
if c>0 then var2=substr(oldvar1,c);
run;
The leading "&" in the informat used in the INPUT statement tells sas not to allow a single blank to terminate the incoming character value (but a double blanks would terminate it).
Thank you @Kurt_Bremser & @mkeintz,
The code that both of you sent worked to solve the issue. I accepted @Kurt_Bremser because he posted his solution first. Thank you both again. This will give me some new code to experiment with.
Ted
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.