I have a pretest survey and posttest survey with 4 categories and missing value. I would like to combine 'Definitely yes' and 'Probably yes' into "Yes", 'Probably not' and 'Definitely not' into "No" so each survey question will have "yes, no and missing", three categories in total. Since I have a list of 14 questions like this, I would like to develop a macro so I don't need to write 14 redundant codes like below:
data Smoke1;
set Smoke;
/*pretest question 1*/
if pre_B1_dan_c = 'Definitely yes' then pre_B1_dan_c_D = " Yes";
if pre_B1_dan_c = 'Probably yes' then pre_B1_dan_c_D = " Yes";
if pre_B1_dan_c = 'Probably not' then pre_B1_dan_c_D = " No";
if pre_B1_dan_c = 'Definitely not' then pre_B1_dan_c_D = " No";
/*posttest question 1*/
if po_B1_dan_c = 'Definitely yes' then po_B1_dan_c_D = " Yes";
if po_B1_dan_c = 'Probably yes' then po_B1_dan_c_D = " Yes";
if po_B1_dan_c = 'Probably not' then po_B1_dan_c_D = " No";
if po_B1_dan_c = 'Definitely not' then po_B1_dan_c_D = " No";
run;
Could anyone please help me with the macro? Thanks!
You can use a format :
proc format;
value $yn
"Definitely yes", "Probably yes"="Yes"
"Definitely not", "Probably not"="No"
other=" ";
run;
data have;
length pre_B1_dan_c po_B1_dan_c $15.;
pre_B1_dan_c="Definitely yes"; po_B1_dan_c="Probably not"; output;
pre_B1_dan_c="Probably yes"; po_B1_dan_c="Definitely not"; output;
pre_B1_dan_c=" "; po_B1_dan_c="Probably not"; output;
run;
data want;
set have;
pre_B1_dan_c=put(pre_B1_dan_c,$yn.);
po_B1_dan_c=put(po_B1_dan_c,$yn.);
run;
Use a format:
proc format; value $ynm "Definately yes","Probably yes"="Yes" other="Yes"; run; data smoke1; set smoke; format pre_b1_dan_c po_b1_dan_c $ynm.; run;
Note, not tested as you have not supplied any test data in the form of a datastep which we always ask for and is mentioned under the Post question button.
You can use a format :
proc format;
value $yn
"Definitely yes", "Probably yes"="Yes"
"Definitely not", "Probably not"="No"
other=" ";
run;
data have;
length pre_B1_dan_c po_B1_dan_c $15.;
pre_B1_dan_c="Definitely yes"; po_B1_dan_c="Probably not"; output;
pre_B1_dan_c="Probably yes"; po_B1_dan_c="Definitely not"; output;
pre_B1_dan_c=" "; po_B1_dan_c="Probably not"; output;
run;
data want;
set have;
pre_B1_dan_c=put(pre_B1_dan_c,$yn.);
po_B1_dan_c=put(po_B1_dan_c,$yn.);
run;
Thanks for your suggestions. I am attaching my dataset here.
Below is the list of variables to be dichotimized:
pre_B1_dan_c po_B1_dan_c pre_B2_dan_ec po_B2_dan_ec pre_B3_harm_c po_B3_harm_c
pre_B4_harm_ec po_B4_harm_ec pre_B5_quit_ec po_B5_quit_ec pre_B6_addict_c po_B6_addict_c pre_B7_addict_ec po_B7_addict_ec pre_B8_media po_B8_media
I was testing the proc format code that you suggested, but I would like to keep the original variable "pre_B1_dan_c" and create a new variable "pre_B1_dan_c_D" with dichotimize categories. This way I will have both raw information and combined categories in the dataset.
There are 3 pairs of questions ( pre_B6_addict_c, po_B6_addict_c, pre_B7_addict_ec, po_B7_addict_ec, pre_B8_media and po_B8_media) have different categories to dichotomize, please see format $ynB.
proc format;
value $ynA
"Definitely yes", "Probably yes"="Yes"
"Definitely not", "Probably not"="No"
other=" ";
value $ynB
"Very likely", "Somewhat likely" = "Yes"
"Neither likely or unlikely", 'Somewhat unlikely', 'Very unlikely' = "No";
run;
Could you please help me with refining my code? Thank you!
Right now my code is as below, but it is very long if I include the all the variables.....
data Smoke1;
set Smoke;
if pre_B1_dan_c = 'Definitely yes' then pre_B1_dan_c_D = " Yes";
if pre_B1_dan_c = 'Probably yes' then pre_B1_dan_c_D = " Yes";
if pre_B1_dan_c = 'Probably not' then pre_B1_dan_c_D = " No";
if pre_B1_dan_c = 'Definitely not' then pre_B1_dan_c_D = " No";
if po_B1_dan_c = 'Definitely yes' then po_B1_dan_c_D = " Yes";
if po_B1_dan_c = 'Probably yes' then po_B1_dan_c_D = " Yes";
if po_B1_dan_c = 'Probably not' then po_B1_dan_c_D = " No";
if po_B1_dan_c = 'Definitely not' then po_B1_dan_c_D = " No";
if pre_B2_dan_ec = 'Definitely yes' then pre_B2_dan_ec_D = " Yes";
if pre_B2_dan_ec = 'Probably yes' then pre_B2_dan_ec_D = " Yes";
if pre_B2_dan_ec = 'Probably not' then pre_B2_dan_ec_D = " No";
if pre_B2_dan_ec = 'Definitely not' then pre_B2_dan_ec_D = " No";
if po_B2_dan_ec = 'Definitely yes' then po_B2_dan_ec_D = " Yes";
if po_B2_dan_ec = 'Probably yes' then po_B2_dan_ec_D = " Yes";
if po_B2_dan_ec = 'Probably not' then po_B2_dan_ec_D = " No";
if po_B2_dan_ec = 'Definitely not' then po_B2_dan_ec_D = " No";
if pre_B3_harm_c = 'Definitely yes' then pre_B3_harm_c_D = " Yes";
if pre_B3_harm_c = 'Probably yes' then pre_B3_harm_c_D = " Yes";
if pre_B3_harm_c = 'Probably not' then pre_B3_harm_c_D = " No";
if pre_B3_harm_c = 'Definitely not' then pre_B3_harm_c_D = " No";
if po_B3_harm_c = 'Definitely yes' then po_B3_harm_c_D = " Yes";
if po_B3_harm_c = 'Probably yes' then po_B3_harm_c_D = " Yes";
if po_B3_harm_c = 'Probably not' then po_B3_harm_c_D = " No";
if po_B3_harm_c = 'Definitely not' then po_B3_harm_c_D = " No";
run;
I am not familiar with the array code...... right now I am still using less simplified code below and got what I want. But I was hoping to refine it using an easier way........
data Smoke1;
set Smoke;
if pre_B1_dan_c = 'Definitely yes' then pre_B1_dan_c_D = " Yes";
if pre_B1_dan_c = 'Probably yes' then pre_B1_dan_c_D = " Yes";
if pre_B1_dan_c = 'Probably not' then pre_B1_dan_c_D = "No";
if pre_B1_dan_c = 'Definitely not' then pre_B1_dan_c_D = "No";
if po_B1_dan_c = 'Definitely yes' then po_B1_dan_c_D = " Yes";
if po_B1_dan_c = 'Probably yes' then po_B1_dan_c_D = " Yes";
if po_B1_dan_c = 'Probably not' then po_B1_dan_c_D = "No";
if po_B1_dan_c = 'Definitely not' then po_B1_dan_c_D = "No";
if pre_B2_dan_ec = 'Definitely yes' then pre_B2_dan_ec_D = " Yes";
if pre_B2_dan_ec = 'Probably yes' then pre_B2_dan_ec_D = " Yes";
if pre_B2_dan_ec = 'Probably not' then pre_B2_dan_ec_D = "No";
if pre_B2_dan_ec = 'Definitely not' then pre_B2_dan_ec_D = "No";
if po_B2_dan_ec = 'Definitely yes' then po_B2_dan_ec_D = " Yes";
if po_B2_dan_ec = 'Probably yes' then po_B2_dan_ec_D = " Yes";
if po_B2_dan_ec = 'Probably not' then po_B2_dan_ec_D = "No";
if po_B2_dan_ec = 'Definitely not' then po_B2_dan_ec_D = "No";
if pre_B3_harm_c = 'Definitely yes' then pre_B3_harm_c_D = " Yes";
if pre_B3_harm_c = 'Probably yes' then pre_B3_harm_c_D = " Yes";
if pre_B3_harm_c = 'Probably not' then pre_B3_harm_c_D = "No";
if pre_B3_harm_c = 'Definitely not' then pre_B3_harm_c_D = "No";
if po_B3_harm_c = 'Definitely yes' then po_B3_harm_c_D = " Yes";
if po_B3_harm_c = 'Probably yes' then po_B3_harm_c_D = " Yes";
if po_B3_harm_c = 'Probably not' then po_B3_harm_c_D = "No";
if po_B3_harm_c = 'Definitely not' then po_B3_harm_c_D = "No";
if pre_B4_harm_ec = 'Definitely yes' then pre_B4_harm_ec_D = " Yes";
if pre_B4_harm_ec = 'Probably yes' then pre_B4_harm_ec_D = " Yes";
if pre_B4_harm_ec = 'Probably not' then pre_B4_harm_ec_D = "No";
if pre_B4_harm_ec = 'Definitely not' then pre_B4_harm_ec_D = "No";
if po_B4_harm_ec = 'Definitely yes' then po_B4_harm_ec_D = " Yes";
if po_B4_harm_ec = 'Probably yes' then po_B4_harm_ec_D = " Yes";
if po_B4_harm_ec = 'Probably not' then po_B4_harm_ec_D = "No";
if po_B4_harm_ec = 'Definitely not' then po_B4_harm_ec_D = "No";
if pre_B5_quit_ec = 'Definitely yes' then pre_B5_quit_ec_D = " Yes";
if pre_B5_quit_ec = 'Probably yes' then pre_B5_quit_ec_D = " Yes";
if pre_B5_quit_ec = 'Probably not' then pre_B5_quit_ec_D = "No";
if pre_B5_quit_ec = 'Definitely not' then pre_B5_quit_ec_D = "No";
if po_B5_quit_ec = 'Definitely yes' then po_B5_quit_ec_D = " Yes";
if po_B5_quit_ec = 'Probably yes' then po_B5_quit_ec_D = " Yes";
if po_B5_quit_ec = 'Probably not' then po_B5_quit_ec_D = "No";
if po_B5_quit_ec = 'Definitely not' then po_B5_quit_ec_D = "No";
if pre_B6_addict_c = 'Very likely' then pre_B6_addict_c_D = " Yes";
if pre_B6_addict_c = 'Somewhat likely' then pre_B6_addict_c_D = " Yes";
if pre_B6_addict_c = 'Neither likely or unlikely' then pre_B6_addict_c_D = "No";
if pre_B6_addict_c = 'Somewhat unlikely' then pre_B6_addict_c_D = "No";
if pre_B6_addict_c = 'Very unlikely' then pre_B6_addict_c_D = "No";
if po_B6_addict_c = 'Very likely' then po_B6_addict_c_D = " Yes";
if po_B6_addict_c = 'Somewhat likely' then po_B6_addict_c_D = " Yes";
if po_B6_addict_c = 'Neither likely or unlikely' then po_B6_addict_c_D = "No";
if po_B6_addict_c = 'Somewhat unlikely' then po_B6_addict_c_D = "No";
if po_B6_addict_c = 'Very unlikely' then po_B6_addict_c_D = "No";
if pre_B7_addict_ec = 'Very likely' then pre_B7_addict_ec_D = " Yes";
if pre_B7_addict_ec = 'Somewhat likely' then pre_B7_addict_ec_D = " Yes";
if pre_B7_addict_ec = 'Neither likely or unlikely' then pre_B7_addict_ec_D = "No";
if pre_B7_addict_ec = 'Somewhat unlikely' then pre_B7_addict_ec_D = "No";
if pre_B7_addict_ec = 'Very unlikely' then pre_B7_addict_ec_D = "No";
if po_B7_addict_ec = 'Very likely' then po_B7_addict_ec_D = " Yes";
if po_B7_addict_ec = 'Somewhat likely' then po_B7_addict_ec_D = " Yes";
if po_B7_addict_ec = 'Neither likely or unlikely' then po_B7_addict_ec_D = "No";
if po_B7_addict_ec = 'Somewhat unlikely' then po_B7_addict_ec_D = "No";
if po_B7_addict_ec = 'Very unlikely' then po_B7_addict_ec_D = "No";
if pre_B8_media = 'Very likely' then pre_B8_media_D = " Yes";
if pre_B8_media = 'Somewhat likely' then pre_B8_media_D = " Yes";
if pre_B8_media = 'Neither likely or unlikely' then pre_B8_media_D = "No";
if pre_B8_media = 'Somewhat unlikely' then pre_B8_media_D = "No";
if pre_B8_media = 'Very unlikely' then pre_B8_media_D = "No";
if po_B8_media = 'Very likely' then po_B8_media_D = " Yes";
if po_B8_media = 'Somewhat likely' then po_B8_media_D = " Yes";
if po_B8_media = 'Neither likely or unlikely' then po_B8_media_D = "No";
if po_B8_media = 'Somewhat unlikely' then po_B8_media_D = "No";
if po_B8_media = 'Very unlikely' then po_B8_media_D = "No";
run;
The first & third examples in the link exactly your situation, recoding variables. Is there something there that's not making sense? Did you have any issues running the example code?
@Denali wrote:
I am not familiar with the array code...... right now I am still using less simplified code below and got what I want. But I was hoping to refine it using an easier way........
Here is a solution using sashelp.vcolumn to determine the columns to transform :
proc format;
value $yn
"Definitely yes", "Probably yes"="Yes"
"Definitely not", "Probably not"="No"
other=" ";
run;
data have;
length pre_B1_dan_c po_B1_dan_c $15.;
pre_B1_dan_c="Definitely yes"; po_B1_dan_c="Probably not"; output;
pre_B1_dan_c="Probably yes"; po_B1_dan_c="Definitely not"; output;
pre_B1_dan_c=" "; po_B1_dan_c="Probably not"; output;
run;
data _NULL_;
call execute("data want; set have;");
do until(eof);
set sashelp.vcolumn end=eof;
where LIBNAME="WORK" and MEMNAME="HAVE"
and (upcase(NAME)=:"PRE_B" or upcase(NAME)=:"PO_B");
call execute('length '||cats(NAME,'_D')||' $3;');
call execute(cats(NAME, '_D=put(',NAME,',$yn.);'));
end;
call execute('run;');
stop;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.