I got this coding exercise to extract certain length of characters from a given data set:
Data Example;
Input abc $12.;
Datalines;
083154564684
783480645621
384563604852
;
Run;
The task is to extract the # from 2nd to 6th digits. So I put num = SUBSTR(abc,2,5);. But instead it returned an extraction from 1st to 5th digits, aka "08315, 78348, 38456". Why is it???
Then I looked into this Example data set and found out all the last digits of those numbers were missing... I'm confused now and not sure if the above problem is somewhat related to it.
(BTW I'm a beginner and I use the SAS University Edition for Windows.)
your input stmt is reading the tab character you entered while typing the data. With your code I got the same results as you. When I removed it (see below) I got the right answer.
Data Example;
Input abc $12.;
Datalines;
083154564684
783480645621
384563604852
;
Run;
data test;
set example;
num = SUBSTR(abc,2,5);
run;
proc print;run;
Data Example;
Input abc $12.;
Datalines;
083154564684
783480645621
384563604852
;
Run;
data want;
set example;
num = SUBSTR(abc,2,5);
run;
proc print noobs;run;
abc num
08315456 83154
78348064 83480
38456360 84563
What's the problem with your test?
Thank you for your reply 😄
Mine gave me the result like the following -
08315456468 --> 08315
78348064562 --> 78348
38456360485 --> 38456
I just tried and set the second parameter as 3 instead of 2, this way could get the correct output...(???)
Plz post your
1. full code
2 log
3 output
so the community members can give you an informed suggestion
Data Example; Input abc $12.; Datalines; 083154564684 783480645621 384563604852 ; Run; data test; set example; num = SUBSTR(abc,2,5); run;
Jeez, that's strange. Okay, is there any chance of an unlikely occurence of a leading blank or control characters?
If at all there is, can you try STRIPPING the blanks?
num = SUBSTR(strip(abc),2,5);
Nah. No luck with your solution.
But you did give me the idea of what's wrong. I realized the leading tab I entered before the data caused the problem... So the position of those digits shifted in the output.
Forgive me of this stupid mistake (facepalm)
And thanks for your help!
Hello. Please refer to my reply to @novinosrin
Thank you!
your input stmt is reading the tab character you entered while typing the data. With your code I got the same results as you. When I removed it (see below) I got the right answer.
Data Example;
Input abc $12.;
Datalines;
083154564684
783480645621
384563604852
;
Run;
data test;
set example;
num = SUBSTR(abc,2,5);
run;
proc print;run;
good eye
Hello. Yes, I found out about it just a minute ago... Such a rookie mistake @_@.
Thanks very much!
Don't let the bugs in the auto-indent function of the SAS editors cause you to generate difficult to maintain SAS code.
Always place the DATALINES (aka CARDS) statement in the FIRST column.
Once you have done that the auto-indent should put the cursor in the first column for the lines of data.
data Example;
Input abc $12.;
datalines;
083154564684
783480645621
384563604852
;
Also do not enter tabs into program files. There are options in the various SAS editors to replace tabs with spaces. You can still use the TAB key while typing to help keep your program lines indented consistently, but the presence of tabs in the actual file just leads to this type of confusion.
Hello Tom. I appreciate the information and advice you've shared.
People here are so helpful and I've learned a lot from you guys just on this one simple/basic problem, which is very encouraging for me as a beginner.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.