My dataset looks like this
And I want it to look like this
Subject Code site subj
0156 00062 156 62
0156 00062 156 62
0047 00032 47 32
0034 00066 34 66
0032 00029 32 29
.
.
The Output i get:
site=15600062
subj=1560
As you can see sas takes out the leading 0 values and the space " ", because of which it's difficult to split.
One way
data have;
input subjectcode $&10.;
datalines;
0156 00062
0156 00062
0047 00032
0034 00066
0032 00029
;
data want;
set have;
site=prxchange('s/0*([1-9]+) 0*([1-9]+)/$1/', -1, subjectcode);
subj=prxchange('s/0*([1-9]+) 0*([1-9]+)/$2/', -1, subjectcode);
run;
Thank you for your reply,
I'm using the proc import to read a text file which contains the variable
PROC IMPORT OUT=manifest
datafile="../../sd/accumulative/manifests/manifestcomb.txt"
REPLACE ;
delimiter='09'x;
guessingrows=32767;
RUN;
quit;
when sas reads the variable "Study Code"n (This is one variable), it takes out space and the leading 0's Eg:
0156 00062 |
becomes: 15600062,
is there something that i need to do in the PROC import to keep the formatting?
@Kp1234 wrote:
Thank you for your reply,
I'm using the proc import to read a text file which contains the variable
PROC IMPORT OUT=manifest
datafile="../../sd/accumulative/manifests/manifestcomb.txt"
REPLACE ;
delimiter='09'x;
guessingrows=32767;
RUN;
quit;
when sas reads the variable "Study Code"n (This is one variable), it takes out space and the leading 0's Eg:
0156 00062 becomes: 15600062,
is there something that i need to do in the PROC import to keep the formatting?
Using PROC IMPORT will introduce some inconsistency in how the variables are defined since it can only guess what you want. But it will NOT remove an embedded spaces. Sounds like your text file does not have spaces between the two parts of the field. And since that makes it look like a numeric value PROC IMPORT will make a numeric variable instead of a character variable.
The real fix is to just write your own program to read the text file instead of using PROC IMPORT. That will also allow you to fix the goofy variable name with spaces in the middle of it that was used in the code in the original question.
If you already have a numeric variable and you want it to print using nine digits then use the Z9. format.
format subject_code z9.;
Or use the Z9. format with a PUT() function call to make a character variable.
subject_code_string=put(subject_code,z9.);
If you have read actual read the two separate values as individual numeric variables and want to generate the 9 digit string then use the PUT() function on each number separately.
subject_code=put(site,z4.)||put(subj,Z5.);
Hi @Kp1234
You can use the compress function to remove the zeros :
data have;
infile datalines truncover;
input "Subject Code"n $10.;
datalines;
0156 00062
0156 00062
0047 00032
0034 00066
0032 00029
;
run;
data want;
set have;
if "Subject Code"n ^="" then do;
site = input(compress(scan("Subject Code"n,1,' '),"0"),best.);
subj = input(compress(scan("Subject Code"n,-1),"0"),best.);
end;
run;
Does the source variable always have 4 digits, space and then 5 digits? Is then just use SUBSTR() to split the field.
site=substr(subject_code,1,4);
subj=substr(subject_code,6,5);
Or are you trying to create numeric variables instead of character variables? Numeric variables do not have any concept of leading zeros. But you can attach a format that will make the values print with leading zeros.
site=input(subject_code,4.);
subj=input(substr(subject_code,6),5.);
format site z4. subj z5.;
data have;
input Subject_Code $12.;
cards;
0156 00062 156 62
0156 00062 156 62
0047 00032 47 32
0034 00066 34 66
0032 00029 32 29
;
data want;
set have;
length subj site $5;
subj=substr(scan(subject_code,1),verify(scan(subject_code,1),'0'));
site=substr(scan(subject_code,-1),verify(scan(subject_code,-1),'0'));
run;
thank you for all your reply, much appreciated.
I'm using the proc import to read a text file which contains the variable
PROC IMPORT OUT=manifest
datafile="../../sd/accumulative/manifests/manifestcomb.txt"
REPLACE ;
delimiter='09'x;
guessingrows=32767;
RUN;
quit;
when sas reads the variable "Study Code"n (This is one variable), it takes out space and the leading 0's Eg:
0156 00062 |
becomes: 15600062,
is there something that I need to do in the PROC import to keep the formatting?
You're on the right track. These are nominal numbers, so keep them as alpha
data have;
length site subj $5;
infile datalines;
input @1 "Subject Code"n $11.;
if "Subject Code"n ^="" then
site=scan("Subject Code"n, 1);
put site=;
if "Subject Code"n ^="" then
subj=scan("Subject Code"n, -1);
put subj=;
datalines;
0156 00062
0156 00062
0047 00032
0034 00066
0032 00029
;
proc print;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.