Hi,
Hope you are doing well!
I have variable called mdotsp, contains values of missing number of tablets but it has different keywords such as "pill", "TAB", etc. I am trying to create numeric variable called mnmtb to assign a value from mdotsp which contains a number of missing pills.
I am using below code to check series of conditional statements based on keywords. I have large data but here I have provided some example data. In below I am getting mnmtb value as missing. I am not sure why, can anyone help me with this logic where I am missing?
data have;
input mdotsp $50.;
datalines;
1PILL
2TAB
3MINI
4PIECE
#5
5#
;
run;
data want;
length mnmtb 8;
set have;
mdotsp = compress(upcase(mdotsp)); /* Normalize mdotsp for case and spaces */
/* Loop through 1 to 300 for mnmtb */
do n = 5 to 1 by -1;
/* Create potential keywords dynamically */
pill_keyword = cats(n, "PILL");
one_pill_keyword = cats(scan(put(n, best.), 1), "PILL");
tab_keyword = cats(n, "TAB");
one_tab_keyword = cats(scan(put(n, best.), 1), "TAB");
mini_keyword = cats(n, "MINI");
one_mini_keyword = cats(scan(put(n, best.), 1), "MINI");
piece_keyword = cats(n, "PIECE");
one_piece_keyword = cats(scan(put(n, best.), 1), "PIECE");
hash_keyword = cats(n, "#");
hash_rev_keyword = cats("#", n);
/* Check if any of these keywords are present in mdotsp */
if index(mdotsp, pill_keyword) or index(mdotsp, one_pill_keyword) or
index(mdotsp, tab_keyword) or index(mdotsp, one_tab_keyword) or
index(mdotsp, mini_keyword) or index(mdotsp, one_mini_keyword) or
index(mdotsp, piece_keyword) or index(mdotsp, one_piece_keyword) or
index(mdotsp, hash_keyword) or index(mdotsp, hash_rev_keyword) then do;
mnmtb=n;
leave; /* Exit the loop once a match is found */
end;
end;
run;
Thanks,
chin
First problem: you only assign a value when the value is not missing:
if mnmtb ne . and
If your intent is to assign a value when MNMTB is missing then
if missing (mnmtb) and
Second is when you use a variable as the second parameter of INDEX (and many other character functions) then entire variable is used. Since you have not defined a length for any of your key word variables they are defaulting to 200 because of the CATS function. So the trailing spaces in the variable are used in the comparison and not found. So all of the index function calls are returning 0.
If N is never two digits you could get away with variable lengths to prevent that. OR remove the trailing spaces.
A drastically shortened example that returns a 1 for the 1PILL:
data want; length mnmtb 8; set have; mdotsp = compress(upcase(mdotsp)); /* Normalize mdotsp for case and spaces */ pill_keyword = cats(1, "PILL"); /* Check if any of these keywords are present in mdotsp */ if missing (mnmtb) and index(mdotsp, strip(pill_keyword)) then do; mnmtb=1; end; run;
/* Check if any of these keywords are present in mdotsp */
if mnmtb ne . and (index(mdotsp, pill_keyword) or index(mdotsp, one_pill_keyword) or
MNMTB has not yet been assigned a value, so when you test to see if it is not equal to missing , it is equal to missing and so this IF condition is always false and the rest of the DO; END; never executes.
It always helps, and it would help here tremendously, if you explain what you are trying to do in words, explain the purpose of this code. Do not discuss code or SAS, we need to know the purpose.
First problem: you only assign a value when the value is not missing:
if mnmtb ne . and
If your intent is to assign a value when MNMTB is missing then
if missing (mnmtb) and
Second is when you use a variable as the second parameter of INDEX (and many other character functions) then entire variable is used. Since you have not defined a length for any of your key word variables they are defaulting to 200 because of the CATS function. So the trailing spaces in the variable are used in the comparison and not found. So all of the index function calls are returning 0.
If N is never two digits you could get away with variable lengths to prevent that. OR remove the trailing spaces.
A drastically shortened example that returns a 1 for the 1PILL:
data want; length mnmtb 8; set have; mdotsp = compress(upcase(mdotsp)); /* Normalize mdotsp for case and spaces */ pill_keyword = cats(1, "PILL"); /* Check if any of these keywords are present in mdotsp */ if missing (mnmtb) and index(mdotsp, strip(pill_keyword)) then do; mnmtb=1; end; run;
Use the STRIP function.
IF your next steps don't really need all those composite "keywords" I would drop them from the data set as that is a lot of baggage of very little use that I can see.
Or perhaps research REGULAR expressions which can search for things like <digits>WORD values and return the Digit part if done correctly.
I think you can just use the magic of regular expressions for this:
data want;
set have;
n=prxchange('s/.*?(\d+).*/\1/', 1, mdotsp)*1;
run;
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.