BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
yawenyu
Obsidian | Level 7

Hello! I am trying to remove certain characters from unformated strings. What I mean by "unformated" is these variables have discrepancies so that a simple compress or substr won't work (or at least I don't think they would).

 

I have the following column from an xlsx file:

 

Here's what they look like ........ and here's what I'm interested in

1X4F--0754 ................................. 0754

1X4F-0016 .................................. 0016

1X4F-0023 .................................. 0023

1X4F-2008-01 ............................. 2008

1X4F-2008-02 ............................. ^

1X4F-330 .................................... 3300

 

Note that the first value had double dashes. This was entered manually and is a mistake. Also note that 2008-01 and 2008-02 are the same and should be one value. The last value is incomplete.

 

Is there a way for me to reshape this?

 

 

Thank you in advance,

Yawen

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
collinelliot
Barite | Level 11

The double dashes can be dealt with via prxchange. The duplicate values could be dealt with with nodupkey sort, but it would depend on the rest of your data.

 

data have;
input str :$12.;
str2 = scan(prxchange('s/--/-/', -1, str), 2, '-');
datalines;
1X4F--0754
1X4F-0016
1X4F-0023
;
run;

View solution in original post

4 REPLIES 4
collinelliot
Barite | Level 11

The double dashes can be dealt with via prxchange. The duplicate values could be dealt with with nodupkey sort, but it would depend on the rest of your data.

 

data have;
input str :$12.;
str2 = scan(prxchange('s/--/-/', -1, str), 2, '-');
datalines;
1X4F--0754
1X4F-0016
1X4F-0023
;
run;
Reeza
Super User

You can look at the SCAN() function as well. The modifiers will help. 

ballardw
Super User

This may get you started:

data example;
   infile datalines truncover;
   informat x $13.;
   input x;
   length y $ 4;
   y= scan(x,2,'-',);
datalines;
1X4F--0754
1X4F-0016
1X4F-0023 
1X4F-2008-01 
1X4F-2008-02
1X4F-330 
;
run;

If the "incomplete" is always handled the same way you could add a test for

 

if length(y) < 4 then y=catt(y,'0'); if the action needed is to always place a 0 at the end. But that was not specified by your request, you just said is was incomplete.

yawenyu
Obsidian | Level 7

You guys were all correct to suggest the scan function! It totally fixed my problem. Thank you all!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 659 views
  • 4 likes
  • 4 in conversation