4 hours ago
I have data set like this with more than 50 variables
Doxycycline_100MG_PO_BID_X_07 | ACYCLOVIR_400MG_TID_X_7_DAYS | METRONIDAZOLE_500MG_POBID_X14D |
DOXYCYCLINE 100MG PO BID X 07 | ||
ACYCLOVIR 400MG TID X 7 DAYS | METRONIDAZOLE 500MG POBID X14D | |
DOXYCYCLINE 100MG PO BID X 07 | METRONIDAZOLE 500MG POBID X14D | |
ACYCLOVIR 400MG TID X 7 DAYS |
I am using if then statement like this - if METRONIDAZOLE_500MG_POBID_X14D="METRONIDAZOLE 500MG POBID X14D" then METRONIDAZOLE_500MG_POBID_X14D="1"; else METRONIDAZOLE_500MG_POBID_X14D="0"; There are more than 50 variables and it is taking time to code with if then statement. Is there a way to do it with a single code?
I would like to have 1 if the response is present and 0 if there is no response for each variable.
How can I do it at once?
Could this work? I haven't done any real check to see that it does exactly what you want, but it should.
%macro recoding(lib_name, ds_name);
*Get all the string variables from the data set and put it into a macro variable.;
proc sql noprint;
select name into : allVars separated by " "
from dictionary.columns
where upcase(libname) = upcase("&lib_name") and upcase(memname) = upcase("&ds_name")
and type = "char";
quit;
*Recode the data;
data &lib_name..&ds_name._copy;
set &lib_name..&ds_name;
*Get the number of variables to loop.;
%let no_of_vars=%sysfunc(countw(&allVars,%str( )));
*Loop all the string variables and create the code for the different checks.;
%do _i = 1 %to &no_of_vars;
%let this_var_name = %scan(&allVars,&_i, %str( ));
if &this_var_name = "&this_var_name" then &this_var_name = "1";
else &this_var_name = "0";
%end;
run;
%mend recoding;
%recoding(sashelp, class);
This code takes all the string variables, if you can or want you can of course do some changes to make sure you don't get some of the variables that shouldn't be checked. But on the other hand, I don't think that your data will have other variables that has the variable name as content of the variable... 🙂
Assuming you just need to set a '1' if there is a value in the column, below should do.
data have;
infile datalines dlm='|' dsd truncover;
input (Doxycycline_100MG_PO_BID_X_07 ACYCLOVIR_400MG_TID_X_7_DAYS METRONIDAZOLE_500MG_POBID_X14D) (:$32.);
datalines;
DOXYCYCLINE 100MG PO BID X 07|ACYCLOVIR 400MG TID X 7 DAYS|METRONIDAZOLE 500MG POBID X14D
DOXYCYCLINE 100MG PO BID X 07||METRONIDAZOLE 500MG POBID X14D
;
run;
data want(drop=_:);
set have;
array var {*} Doxycycline_100MG_PO_BID_X_07 ACYCLOVIR_400MG_TID_X_7_DAYS METRONIDAZOLE_500MG_POBID_X14D;
do _i=1 to dim(var);
var[_i]= put(not missing(var[_i]),1.);
end;
run;
This gives you the 0 and 1 as you want but the variable lengths are now way too big. They should be $1 actually.
I didn't want to overload the code but there would be automated ways to recreate the variables. It would help though if you could tell us a bit more about your Have dataset - especially if it got also many other variables in it.
It may be possible if the condition you are actually searching for is that the content of a variable matches the variable name when replacing embedded blanks in the value with an _ character to match the name of the variable. Almost anything else may be very problematic.
You example data however does not show ANY values other than the variable name with blanks or a missing value (all blank). If that is the actual condition of your variable then this is easy, though I would tend to create additional variables that are actually numeric for a number of reasons with the current NAME of the variable as the LABEL for a numeric.
Such as:
data want; set have; array names Doxycycline_100MG_PO_BID_X_07 ACYCLOVIR_400MG_TID_X_7_DAYS METRONIDAZOLE_500MG_POBID_X14D ; array num{3}; do i = 1 to dim(names); num[i]= not missing(names[i]); end; label num1 ="Doxycycline_100MG_PO_BID_X_07" num2 ="ACYCLOVIR_400MG_TID_X_7_DAYS" num3 ="METRONIDAZOLE_500MG_POBID_X14D" ; run;
You may ask why I would suggest the actual numeric variable?
Reasons: 1) the logic assign will assign 1/0 without if/then/else
2) the overall Sum of any of the NUM variables will give you the total number of recorded instances
3) the overall Mean of any of the NUM variables will give a percentage of recorded instances
4) the row sum of the NUM variables (or selected group) will give the number recorded per observation
5) the row mean will give you a percentage of encountered drugs per observation
6) a MAX for an observation or grouping variable of 1 will tell you at least one observation, or of a group of drugs, had a least one recorded, a max of 0 says none recorded
And more depending on what you need. Almost any counts of drugs would be easier.
If a single variable has more values then you need to provide more details.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.