- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
This is my fist post, so please excuse any lack of etiquette. My question is how to remove the trailing zeros on a string. The following is my fictional dataset and below that is how I wish it would look after a datastep. Ignore the decimal point part of the code, since I need that for a subsequent step not related to dropping the trailing zeros. Also, the final dataset should be formatted as characters.
data icd9;
input id diag$;
length newdiag $6;
newdiag = substr(diag,1,3)||"."||
substr(diag,4,2);
datalines;
1 80500
2 80555
3 80550
4 e8500
5 e8550
6 e8555
7 v8500
8 v8550
9 v8555
10 08500
11 08550
12 08555
13 0414
14 041
15 04144
;
run;
Desired final product:
1 | 805 |
2 | 805.55 |
3 | 805.5 |
4 | e85 |
5 | e85.5 |
6 | e85.55 |
7 | v85 |
8 | v85.5 |
9 | v85.55 |
10 | 085 |
11 | 085.5 |
12 | 085.55 |
13 | 041.4 |
14 | 041 |
15 | 041.44 |
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
This look about right.
input id diag$;
length newdiag $6;
f = findc(diag,'0','TBK');
newdiag = substrn(diag,1,max(3,f));
newdiag = catx('.',substrn(newdiag,1,3),substrn(newdiag,4));
datalines;
1 80500
2 80555
3 80550
4 e8500
5 e8550
6 e8555
7 v8500
8 v8550
9 v8555
10 08500
11 08550
12 08555
13 0414
14 041
15 04144
;
run;
proc print;
run;
Message was edited by: data _null_
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for the prompt post. Yes, this does seem to be what I was trying to achieve. Much thanks, in that this would have taken me a long time to figure out myself.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
My error, I thought that I included all of the scenarios but forgot about the following:
Hello,
This is my fist post, so please excuse any lack of etiquette. My question is how to remove the trailing zeros on a string. The following is my fictional dataset and below that is how I wish it would look after a datastep. Ignore the decimal point part of the code, since I need that for a subsequent step not related to dropping the trailing zeros. Also, the final dataset should be formatted as characters.
data icd9;
input id diag$;
length newdiag $6;
newdiag = substr(diag,1,3)||"."||
substr(diag,4,2);
datalines;
1 80500
2 80555
3 80550
4 e8500
5 e8550
6 e8555
7 v8500
8 v8550
9 v8555
10 08500
11 08550
12 08555
13 0414
14 041
15 04144
16 80000
17 85000
18 04000
;
run;
Desired final product:
1 | 805 |
2 | 805.55 |
3 | 805.5 |
4 | e85 |
5 | e85.5 |
6 | e85.55 |
7 | v85 |
8 | v85.5 |
9 | v85.55 |
10 | 085 |
11 | 085.5 |
12 | 085.55 |
13 | 041.4 |
14 | 041 |
15 | 041.44 |
16 | 800 |
17 | 850 |
18 | 040 |
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Another approach using Prxchange:
data icd9;
input id diag$;
length _new $6;
_new=catx('.',substr(diag,1,3),prxchange('s/([^0]?)(0+$)/$1/io',-1,compress(substr(diag,4))));
datalines;
1 80500
2 80000
3 80550
4 e8500
5 e8550
6 e8555
7 v8500
8 v8550
9 v40
10 08500
11 08550
12 08555
13 0414
14 041
15 04144
;
run;
Haikuo
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I prefer data null's solution - but here another variant using a bit a simpler RegEx:
data icd9;
input id diag :$5.;
length newdiag $ 6;
newdiag=prxchange('s/0{1,2} *$//oi',1,diag);
if length(newdiag)>3 then
newdiag = substrn(newdiag,1,3)||"."||substrn(newdiag,4);
datalines;
1 80500
2 80555
3 80550
4 e8500
5 e8550
6 e8555
7 v8500
8 v8550
9 v8555
10 08500
11 08550
12 08555
13 0414
14 041
15 04144
;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What does the 'TBK' do,data_null_?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Thank you everyone. All three options worked great and are very helpful.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
I'm glad you found some solutions, H, and your etiquette is right on. Thanks for marking "Correct" and "Helpful" answers. Those are most useful for others who may be searching for answers to a similar problem.
Anna
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Well since I received such expeditious and straightforward help, I am going to add hopefully one last twist. Since this seems to remain in the same content area, I won't opt to make this as a new thread. My question is as follows:
How would I best repeat this for for multiple variables/columns. I have diag1-diag60, and I would like all 60 variables transformed as discussed above? I could obviously do each one independently or in one big swoop. Though, I have not quite figured out how to do it with the multiple statements and variable names. Any help would be greatly appreciated.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Using an array should help, something like the following:
data icd9;
input id (diag1-diag4) (:$);
array diag diag1-diag4;
array _new $8 _new1-_new4;
do over _new;
_new=catx('.',substr(diag,1,3),prxchange('s/([^0]?)(0+$)/$1/io',-1,compress(substr(diag,4))));
end;
datalines;
1 80500 80500 80500 80500
2 80000 80000 80000 80000
3 80550 80550 80550 80550
4 e8500 e8500 e8500 e8500
5 e8550 e8550 e8550 e8550
6 e8555 e8555 e8555 e8555
7 v8500 v8500 v8500 v8500
8 v8550 v8550 v8550 v8550
9 v40 v40 v40 v40
10 08500 08500 08500 08500
11 08550 08550 08550 08550
12 08555 08555 08555 08555
13 0414 0414 0414 0414
14 041 041 041 041
15 04144 04144 04144 04144
;
run;
Haikuo