Hi
i want to separate character from numeric value
for eg :
Txt |
50mg |
slicing data |
100/50 mg |
mg |
unknown |
0.25 mmmgb |
how to get the output like this:
c | Txt_b |
50 | mg |
slicing data | |
100/50 | mg |
mg | |
unknown | |
0.25 | mmmgb |
Try this:
data want;
txt = "100/50 mg medicin";
c = compress(txt,"0123456789.,/ ","K"); *K is to keep the chars you specified.;
txt_b = compress(txt,"0123456789.,/");
run;
Try this:
data want;
txt = "100/50 mg medicin";
c = compress(txt,"0123456789.,/ ","K"); *K is to keep the chars you specified.;
txt_b = compress(txt,"0123456789.,/");
run;
If the accepted solution works to achieve your expected results, one recommendation I would make is regarding the "txt_b" line of logic.
As it stands the logic used to populate the "txt_b" field excludes the characters pulled in the "C" field above it, except for the blank space. This in turn causes any spaces between the numeric value and the character value to remain. So when you have "100/50 mg medicine", the blank between '50' and 'mg' remains. So to clean this up without incorrectly removing the blank between the words 'mg' and 'medicine', we can use the STRIP FUNCTION.
DATA WORK.Want;
SET WORK.Have;
c = COMPRESS(Txt,"0123456789.,/ ","K"); *K is to keep the chars you specified.;
txt_b = STRIP(COMPRESS(Txt,"0123456789.,/"));
RUN;
100/50 isn't a valid numeric because of the "/". You could split it into two numerics - 100 and 50. How do you want to handle this?
Here's an alternative option:
Obviously not as simple as the accepted solution, but still works regardless.
DATA WORK.WANT (DROP=Re Position Length Start Stop);
SET WORK.HAVE;
FORMAT C $10. Txt_B $15.;
IF _N_=1 THEN RE= PRXPARSE("m/0|1|2|3|4|5|6|7|8|9/oi");
RETAIN Re;
Start=1;
Stop=LENGTH(Txt);
IF PRXMATCH(Re,Txt) = 0 THEN DO;
C = ' ';
Txt_B = Txt;
END;
ELSE IF PRXMATCH(Re,Txt) > 0 THEN DO;
CALL PRXNEXT(Re, Start, Stop, Txt, Position, Length);
DO WHILE (Position > 0);
C = STRIP(SUBSTR(Txt, 1, Start-1));
Txt_B = STRIP(SUBSTR(Txt, Start, Stop));
CALL PRXNEXT(Re, Start, Stop, Txt, Position, Length);
END;
END;
RUN;
The main benefit to this option, over the other solution, is that depending on any other values in the text field, some issues may arise.
For example:
If there is a value of "15/100 mg/d", this is something completely made up just to show what could occur with the accepted solution logic.
As it stands that logic would produce C="15/100/" Txt_B=" mgd". So there would be an incorrect slash at the end of the value in the C field and it would be removed from the Txt_B field. Even if you removed the '/' from the Txt_B logic you would still have a bad value of "/mg/d" with the incorrect slash in the front now. If you will never have starting values like this, then you will be fine with the accepted solution.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.