Removing Characters From Unformated String

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 14
Accepted Solution

Removing Characters From Unformated String

Hello! I am trying to remove certain characters from unformated strings. What I mean by "unformated" is these variables have discrepancies so that a simple compress or substr won't work (or at least I don't think they would).

 

I have the following column from an xlsx file:

 

Here's what they look like ........ and here's what I'm interested in

1X4F--0754 ................................. 0754

1X4F-0016 .................................. 0016

1X4F-0023 .................................. 0023

1X4F-2008-01 ............................. 2008

1X4F-2008-02 ............................. ^

1X4F-330 .................................... 3300

 

Note that the first value had double dashes. This was entered manually and is a mistake. Also note that 2008-01 and 2008-02 are the same and should be one value. The last value is incomplete.

 

Is there a way for me to reshape this?

 

 

Thank you in advance,

Yawen

 

 


Accepted Solutions
Solution
‎04-06-2017 05:53 PM
PROC Star
Posts: 295

Re: Removing Characters From Unformated String

The double dashes can be dealt with via prxchange. The duplicate values could be dealt with with nodupkey sort, but it would depend on the rest of your data.

 

data have;
input str :$12.;
str2 = scan(prxchange('s/--/-/', -1, str), 2, '-');
datalines;
1X4F--0754
1X4F-0016
1X4F-0023
;
run;

View solution in original post


All Replies
Solution
‎04-06-2017 05:53 PM
PROC Star
Posts: 295

Re: Removing Characters From Unformated String

The double dashes can be dealt with via prxchange. The duplicate values could be dealt with with nodupkey sort, but it would depend on the rest of your data.

 

data have;
input str :$12.;
str2 = scan(prxchange('s/--/-/', -1, str), 2, '-');
datalines;
1X4F--0754
1X4F-0016
1X4F-0023
;
run;
Super User
Posts: 19,136

Re: Removing Characters From Unformated String

You can look at the SCAN() function as well. The modifiers will help. 

Super User
Posts: 11,128

Re: Removing Characters From Unformated String

This may get you started:

data example;
   infile datalines truncover;
   informat x $13.;
   input x;
   length y $ 4;
   y= scan(x,2,'-',);
datalines;
1X4F--0754
1X4F-0016
1X4F-0023 
1X4F-2008-01 
1X4F-2008-02
1X4F-330 
;
run;

If the "incomplete" is always handled the same way you could add a test for

 

if length(y) < 4 then y=catt(y,'0'); if the action needed is to always place a 0 at the end. But that was not specified by your request, you just said is was incomplete.

Occasional Contributor
Posts: 14

Re: Removing Characters From Unformated String

You guys were all correct to suggest the scan function! It totally fixed my problem. Thank you all!

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 226 views
  • 4 likes
  • 4 in conversation