Help using Base SAS procedures

To separate letters in a word

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 5
Accepted Solution

To separate letters in a word

I have a patient unique ID, say "CT00001A" and I want to separate the prefix "CT" from the serial number "00001" and the suffix "A". Here is a program I used

DATA TEST;

SET sasuser.CIHP_RADET;

B = COMPRESS (A,"","A");

C = COMPRESS (A,B); RUN;

PROC PRINT DATA = TEST;

RUN;

 

and it returned 00001 in one column and CTA in another. Now I want to separate CT from A in 2 separate columns


Accepted Solutions
Solution
‎11-02-2017 11:32 AM
Frequent Learner
Posts: 1

Re: To separate letters in a word

Please try this:

 

data new;
infile datalines;
input x $15.;

k1 = anydigit (x);
A = substr (x, 1, k1-1);
temp = substr(x,k1);
k2 = anyalpha (temp);
Num = substr(temp, 1, k2-1);
B = substr(temp, k2);
keep x A Num B;
datalines;
CT00010A
AMOPI00066
CT00069DV
CTPM00069B
OPEI00079DV
;

View solution in original post


All Replies
Super User
Posts: 23,662

Re: To separate letters in a word

What are the rules? 
Why can't you just substring the first two characters using SUBSTR()?

 

Obviously this will need to be used on multiple observations so is it always the first two characters? Or first set of letters? How do you know A needs to be removed? What if it was "CT000B1A" what would be kept or is that not possible?

Occasional Contributor
Posts: 5

Re: To separate letters in a word

Alright, here is a better explanation.

 

I have attached 2 SAS OUTPUT pictures and 2 Excel pictures of the dataset, the variables "prefix", " serial number" and "suffix" were meant to concatenate as "Patient Unique ID", but a data clerk erroneously copied the wrong thing.

 

The A or B after "CT00001A" is the suffix.

 

Thank you.

Occasional Contributor
Posts: 5

Re: To separate letters in a word

The prefixes could be any of CT, TI, TII, PM, OPI, AMOPI, TB, OPEI, PNE. Therefore, it is not possible to substring the first two characters, but it is the first set of letters that needs to be moved into a separate column as "prefixes."
The alphabet that comes after the numbers always comes after the numbers because they were meant to be entered into a separate column as suffixes and then concatenated.
Presently, my challenge is how to separate the last alphabet from the following examples CTA, AMOPIB, OPEIC as the COMPRESS function groups all the alphabets into one column.
Trusted Advisor
Posts: 1,337

Re: To separate letters in a word

Your rules apparently expect the original value to have a alphabetic prefix, a numerical middle and an alphabetic suffix, in which each  component can have a varying length.  You can use SCAN to get the prefix and suffix, when you declare all digits as delimitrs.  And the middle is nothing but the original string with all alphabetic characters removed (i.e. compressed out):

 

data _null_;
  string='CTX00001A';
  str_prefix=scan(string,1,,'d');
  str_middle=compress(string,,'A');
  str_suffix=scan(string,2,,'d'); 
  put (str:) (=);
run;
Occasional Contributor
Posts: 5

Re: To separate letters in a word

Many thanks, but I am actually trying this on a variable with several observations. Here is a short copy of it:

 

The SAS System

 

Obs

A

10

CT00010A

54

AMOPI00066

55

CT00069DV

56

CTPM00069B

67

OPEI00079DV

68

PM00080A

69

PM00080C

70

TI00080A

71

PM00080b

72

CT00081B

73

CT00081A

75

PM00084A

76

TI00084B

77

CT00085A

78

CT00085B

79

PM00086A

80

PM00086B

81

PM00088A

82

PM00088B

88

PM00092A

89

PM00092B

101

TI00117B

102

TI00117A

179

PM02414

200

PI03049A

201

PI03049B

 

Super User
Posts: 23,662

Re: To separate letters in a word


Ken_nuga wrote:

Many thanks, but I am actually trying this on a variable with several observations. Here is a short copy of it:

  


It doesn't matter how many observations, the code is the same. You need to reference your data set and your variables, but the idea is correct. 

Solution
‎11-02-2017 11:32 AM
Frequent Learner
Posts: 1

Re: To separate letters in a word

Please try this:

 

data new;
infile datalines;
input x $15.;

k1 = anydigit (x);
A = substr (x, 1, k1-1);
temp = substr(x,k1);
k2 = anyalpha (temp);
Num = substr(temp, 1, k2-1);
B = substr(temp, k2);
keep x A Num B;
datalines;
CT00010A
AMOPI00066
CT00069DV
CTPM00069B
OPEI00079DV
;

Occasional Contributor
Posts: 5

Re: To separate letters in a word

Alright, it is done! Here is what I did and a partial print out.

 

proc import out = sasuser.CIHP_RADET datafile = 'C:\Users\ONUGHA\Desktop\A.xlsx' dbms=xlsx replace;

run;

 

DATA TEST;

SET sasuser.CIHP_RADET;

k1 = anydigit (A);

PUID_PREFIX = substr (A, 1, k1-1);

temp = substr(A,k1);

k2 = anyalpha (temp);

PUID = substr(temp, 1, k2-1);

PUID_SUFFIX = substr(temp, k2);

keep A PUID_PREFIX PUID PUID_SUFFIX;

RUN;

PROC PRINT DATA = TEST; RUN;

 

The SAS System

 

Obs

A

PUID_PREFIX

PUID

PUID_SUFFIX

1

PM00003DV

PM

00003

DV

2

CT00004MSG

CT

00004

MSG

3

OPI00004AMOPI

OPI

00004

AMOPI

4

CT/PI00005

CT/PI

00005

 

5

CCP/PM00006

CCP/PM

00006

 

6

CT00007

CT

00007

 

7

PM00008

PM

00008

 

8

OPI00008

OPI

00008

 

9

PM00009

PM

00009

 

10

CT00010A

CT

00010

A

11

PM00010

PM

00010

 

 

Great, thank you!

Super User
Posts: 23,662

Re: To separate letters in a word

@Ken_nuga in this case you shouldn't have marked your answer as the solution, @chaatak provided the correct solution and it's worth a few seconds to acknowledge that.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 556 views
  • 3 likes
  • 4 in conversation