BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Ken_nuga
Fluorite | Level 6

I have a patient unique ID, say "CT00001A" and I want to separate the prefix "CT" from the serial number "00001" and the suffix "A". Here is a program I used

DATA TEST;

SET sasuser.CIHP_RADET;

B = COMPRESS (A,"","A");

C = COMPRESS (A,B); RUN;

PROC PRINT DATA = TEST;

RUN;

 

and it returned 00001 in one column and CTA in another. Now I want to separate CT from A in 2 separate columns

1 ACCEPTED SOLUTION

Accepted Solutions
chaatak
Fluorite | Level 6

Please try this:

 

data new;
infile datalines;
input x $15.;

k1 = anydigit (x);
A = substr (x, 1, k1-1);
temp = substr(x,k1);
k2 = anyalpha (temp);
Num = substr(temp, 1, k2-1);
B = substr(temp, k2);
keep x A Num B;
datalines;
CT00010A
AMOPI00066
CT00069DV
CTPM00069B
OPEI00079DV
;

View solution in original post

9 REPLIES 9
Reeza
Super User

What are the rules? 
Why can't you just substring the first two characters using SUBSTR()?

 

Obviously this will need to be used on multiple observations so is it always the first two characters? Or first set of letters? How do you know A needs to be removed? What if it was "CT000B1A" what would be kept or is that not possible?

Ken_nuga
Fluorite | Level 6

Alright, here is a better explanation.

 

I have attached 2 SAS OUTPUT pictures and 2 Excel pictures of the dataset, the variables "prefix", " serial number" and "suffix" were meant to concatenate as "Patient Unique ID", but a data clerk erroneously copied the wrong thing.

 

The A or B after "CT00001A" is the suffix.

 

Thank you.

Ken_nuga
Fluorite | Level 6
The prefixes could be any of CT, TI, TII, PM, OPI, AMOPI, TB, OPEI, PNE. Therefore, it is not possible to substring the first two characters, but it is the first set of letters that needs to be moved into a separate column as "prefixes."
The alphabet that comes after the numbers always comes after the numbers because they were meant to be entered into a separate column as suffixes and then concatenated.
Presently, my challenge is how to separate the last alphabet from the following examples CTA, AMOPIB, OPEIC as the COMPRESS function groups all the alphabets into one column.
mkeintz
PROC Star

Your rules apparently expect the original value to have a alphabetic prefix, a numerical middle and an alphabetic suffix, in which each  component can have a varying length.  You can use SCAN to get the prefix and suffix, when you declare all digits as delimitrs.  And the middle is nothing but the original string with all alphabetic characters removed (i.e. compressed out):

 

data _null_;
  string='CTX00001A';
  str_prefix=scan(string,1,,'d');
  str_middle=compress(string,,'A');
  str_suffix=scan(string,2,,'d'); 
  put (str:) (=);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ken_nuga
Fluorite | Level 6

Many thanks, but I am actually trying this on a variable with several observations. Here is a short copy of it:

 

The SAS System

 

Obs

A

10

CT00010A

54

AMOPI00066

55

CT00069DV

56

CTPM00069B

67

OPEI00079DV

68

PM00080A

69

PM00080C

70

TI00080A

71

PM00080b

72

CT00081B

73

CT00081A

75

PM00084A

76

TI00084B

77

CT00085A

78

CT00085B

79

PM00086A

80

PM00086B

81

PM00088A

82

PM00088B

88

PM00092A

89

PM00092B

101

TI00117B

102

TI00117A

179

PM02414

200

PI03049A

201

PI03049B

 

Reeza
Super User

@Ken_nuga wrote:

Many thanks, but I am actually trying this on a variable with several observations. Here is a short copy of it:

  


It doesn't matter how many observations, the code is the same. You need to reference your data set and your variables, but the idea is correct. 

chaatak
Fluorite | Level 6

Please try this:

 

data new;
infile datalines;
input x $15.;

k1 = anydigit (x);
A = substr (x, 1, k1-1);
temp = substr(x,k1);
k2 = anyalpha (temp);
Num = substr(temp, 1, k2-1);
B = substr(temp, k2);
keep x A Num B;
datalines;
CT00010A
AMOPI00066
CT00069DV
CTPM00069B
OPEI00079DV
;

Ken_nuga
Fluorite | Level 6

Alright, it is done! Here is what I did and a partial print out.

 

proc import out = sasuser.CIHP_RADET datafile = 'C:\Users\ONUGHA\Desktop\A.xlsx' dbms=xlsx replace;

run;

 

DATA TEST;

SET sasuser.CIHP_RADET;

k1 = anydigit (A);

PUID_PREFIX = substr (A, 1, k1-1);

temp = substr(A,k1);

k2 = anyalpha (temp);

PUID = substr(temp, 1, k2-1);

PUID_SUFFIX = substr(temp, k2);

keep A PUID_PREFIX PUID PUID_SUFFIX;

RUN;

PROC PRINT DATA = TEST; RUN;

 

The SAS System

 

Obs

A

PUID_PREFIX

PUID

PUID_SUFFIX

1

PM00003DV

PM

00003

DV

2

CT00004MSG

CT

00004

MSG

3

OPI00004AMOPI

OPI

00004

AMOPI

4

CT/PI00005

CT/PI

00005

 

5

CCP/PM00006

CCP/PM

00006

 

6

CT00007

CT

00007

 

7

PM00008

PM

00008

 

8

OPI00008

OPI

00008

 

9

PM00009

PM

00009

 

10

CT00010A

CT

00010

A

11

PM00010

PM

00010

 

 

Great, thank you!

Reeza
Super User

@Ken_nuga in this case you shouldn't have marked your answer as the solution, @chaatak provided the correct solution and it's worth a few seconds to acknowledge that.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 4946 views
  • 3 likes
  • 4 in conversation