BookmarkSubscribeRSS Feed
S_Iftekhar
Calcite | Level 5
Hello,

I am sure this has been asked before, sorry cannot find it.

I am trying to sort a string variable in SAS 9.1, that appears in the following format
Variable (actually appears in a random order)
Drv1
Drv2
....

Drv10
Drv11
M1
M2
...

M10
M11
..


but when I sort this variable, Drv10 11 comes before Drv2 (so it becomes Drv1 Drv10 Drv11 ... Drv19 Drv2 ... for all the variables), How can I stop that and sort in a way that allows 2 to come before 10. ? Message was edited by: S.Iftekhar
12 REPLIES 12
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Extract the numeric portion and the character portion as two variables or recreate a "formatted" variable for sorting using another (DATA step, internal) variable and format it with SAS FORMAT like Z4.

Scott Barry
SBBWorks, Inc.
S_Iftekhar
Calcite | Level 5
Can you help with this process - "Extract the numeric portion and the character portion as two variables"

I guess if I can do this (seperate character and numeric variables), I can use a 'proc sort' with both variables in 'by' option, and then concatenate them.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Yes, you would use PROC SORT with the separated variables.

Use one of the various SAS functions within an assignment statement, such as:

data _null_;
string = 'ABC111.2222222222223';
char = substr(string,1,3);
num = input(substr(string,anydigit(string)),best.);
putlog _all_;
run;


Scott Barry
SBBWorks, Inc.
S_Iftekhar
Calcite | Level 5
Thanks for that. That did work for the numeric part.


But not working for the string part as the string part of the variable is not always 3 character long. How can I extract the character part without hardcoding the length?

the variable is always in a format (character)||(numeric) ... just the character part has varying legth.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Suggest you review the SAS documentation and apply what was provided as an example -- focus on the SUBSTR function further to assign the character-only portion.

Scott Barry
SBBWorks, Inc.
RickM
Fluorite | Level 6
If the first character is unique to the length of the string part you could use if statements.

Im sure there are better ways that would not involve hard coding though.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
Here's another hint - the ANYDIGIT function is important to identify the first numeric - consider using it with SUBSTR.

Scott Barry
SBBWorks, Inc.
Cynthia_sas
SAS Super FREQ
Hi:
Look at the examples of the COMPRESS function. You can use it to compress specific characters out of a text string. For example, if you compressed all Alpha characters out of a string, what would be left would be numbers and if you compressed all numbers out of a string, what would be left would be the alpha characters. There are some good examples of the COMPRESS function in the documentation.

In addition, the SCAN function will break a text string up into "chunks", based on a delimiter. So, for example, in the text strings:
[pre]
AA123
ZZZ4567
[/pre]

it doesn't matter how "long" the alpha characters' portion is...the "dividing" line between the alpha portion and the number portion is the 1st occurence of any number. The SCAN function will break a text string into chunks, based on the delimiter and/or modifier that you specify. Again, the documentation has some good examples of using the SCAN function.

When you use either the COMPRESS or the SCAN function, your digits/numbers will still be stored in character format until you convert them to be numeric variables using the INPUT function....only then will they sort correctly ... then they are numbers.

And, to continue the "fun with functions" you might also look at the ANYDIGIT function which will tell you the starting position of any digit in a character string. So, for example, with these 2 text strings, ANYDIGIT would tell you that the '1' is in position 3 (which means that the last alpha character is in position 2) and for the second string would tell you that the '1' is in position 5 (which means that the last alpha character is in position 4).
[pre]
AA123
BBBB1234
[/pre]


cynthia
data_null__
Jade | Level 19
The result you got by sorting is not random.

However, I will change the subject slightly as this type of sorting is most often, for me anyway, desired for SAS variable names that are members of an enumerated variable list. The answer to easy sorting is PROC CONTENTS.

[pre]
data test;
attrib Drv10 Drv1 Drv2 DRv11 M1 M10 M11 M2 length=8;
stop;
call missing(of _all_);
run;
proc contents noprint data=test order=ignorecase out=contents(keep=name);
run;
proc print;
run;
[/pre]

The result of the option IGNORECASE is demonstrated by DRv11. In otherwords proc contents knows all about how to sort variable names "the right way".

[pre]
Obs NAME

1 Drv1
2 Drv2
3 Drv10
4 DRv11
5 M1
6 M2
7 M10
8 M11
[/pre]
Cynthia_sas
SAS Super FREQ
Hi:
And, I should have asked whether you have SAS 9.2. If so, you're in luck, because the SORTSEQ option of PROC SORT was designed to handle these kinds of situations with the data.

Specifically, if you look up SORTSEQ=LINGUISTIC, there are a LOT of options, such as NUMERIC_COLLATION which may do exactly what you want in the PROC SORT step.
http://www2.sas.com/proceedings/forum2007/297-2007.pdf (see page 14)
http://support.sas.com/kb/31/369.html
http://support.sas.com/resources/papers/linguistic_collation.pdf

cynthia
S_Iftekhar
Calcite | Level 5
Thanks. I am using SAS 9.1.3.


I have successfully solved the problem using anydigit function, using the following commands

char=input(substr(string,1,(anydigit(string)-1)) ,$8.);
num = input(substr(string,anydigit(string)),best.);

then,

proc sort data=test2; by char num;
run;

I figured that could be done with 'SCAN' too ....

Thanks everybody, you have been of great help.
sbb
Lapis Lazuli | Level 10 sbb
Lapis Lazuli | Level 10
FYI - using the INPUT on the character variable is superfluous - you only need to do the SUBSTR and you will get the correct result in variable CHAR.

Scott Barry
SBBWorks, Inc.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 12 replies
  • 4669 views
  • 0 likes
  • 5 in conversation