Help using Base SAS procedures

Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Reply
New Contributor
Posts: 4

Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Hello,

I am sure this has been asked before, sorry cannot find it.

I am trying to sort a string variable in SAS 9.1, that appears in the following format
Variable (actually appears in a random order)
Drv1
Drv2
....

Drv10
Drv11
M1
M2
...

M10
M11
..


but when I sort this variable, Drv10 11 comes before Drv2 (so it becomes Drv1 Drv10 Drv11 ... Drv19 Drv2 ... for all the variables), How can I stop that and sort in a way that allows 2 to come before 10. ? Message was edited by: S.Iftekhar
Super Contributor
Super Contributor
Posts: 3,174

Re: Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Posted in reply to S_Iftekhar
Extract the numeric portion and the character portion as two variables or recreate a "formatted" variable for sorting using another (DATA step, internal) variable and format it with SAS FORMAT like Z4.

Scott Barry
SBBWorks, Inc.
New Contributor
Posts: 4

Re: Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Can you help with this process - "Extract the numeric portion and the character portion as two variables"

I guess if I can do this (seperate character and numeric variables), I can use a 'proc sort' with both variables in 'by' option, and then concatenate them.
Super Contributor
Super Contributor
Posts: 3,174

Re: Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Posted in reply to S_Iftekhar
Yes, you would use PROC SORT with the separated variables.

Use one of the various SAS functions within an assignment statement, such as:

data _null_;
string = 'ABC111.2222222222223';
char = substr(string,1,3);
num = input(substr(string,anydigit(string)),best.);
putlog _all_;
run;


Scott Barry
SBBWorks, Inc.
New Contributor
Posts: 4

Re: Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Thanks for that. That did work for the numeric part.


But not working for the string part as the string part of the variable is not always 3 character long. How can I extract the character part without hardcoding the length?

the variable is always in a format (character)||(numeric) ... just the character part has varying legth.
Super Contributor
Super Contributor
Posts: 3,174

Re: Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Posted in reply to S_Iftekhar
Suggest you review the SAS documentation and apply what was provided as an example -- focus on the SUBSTR function further to assign the character-only portion.

Scott Barry
SBBWorks, Inc.
Regular Contributor
Posts: 165

Re: Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Posted in reply to S_Iftekhar
If the first character is unique to the length of the string part you could use if statements.

Im sure there are better ways that would not involve hard coding though.
Super Contributor
Super Contributor
Posts: 3,174

Re: Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Here's another hint - the ANYDIGIT function is important to identify the first numeric - consider using it with SUBSTR.

Scott Barry
SBBWorks, Inc.
SAS Super FREQ
Posts: 8,868

Re: Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Posted in reply to S_Iftekhar
Hi:
Look at the examples of the COMPRESS function. You can use it to compress specific characters out of a text string. For example, if you compressed all Alpha characters out of a string, what would be left would be numbers and if you compressed all numbers out of a string, what would be left would be the alpha characters. There are some good examples of the COMPRESS function in the documentation.

In addition, the SCAN function will break a text string up into "chunks", based on a delimiter. So, for example, in the text strings:
[pre]
AA123
ZZZ4567
[/pre]

it doesn't matter how "long" the alpha characters' portion is...the "dividing" line between the alpha portion and the number portion is the 1st occurence of any number. The SCAN function will break a text string into chunks, based on the delimiter and/or modifier that you specify. Again, the documentation has some good examples of using the SCAN function.

When you use either the COMPRESS or the SCAN function, your digits/numbers will still be stored in character format until you convert them to be numeric variables using the INPUT function....only then will they sort correctly ... then they are numbers.

And, to continue the "fun with functions" you might also look at the ANYDIGIT function which will tell you the starting position of any digit in a character string. So, for example, with these 2 text strings, ANYDIGIT would tell you that the '1' is in position 3 (which means that the last alpha character is in position 2) and for the second string would tell you that the '1' is in position 5 (which means that the last alpha character is in position 4).
[pre]
AA123
BBBB1234
[/pre]


cynthia
Respected Advisor
Posts: 3,799

Re: Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Posted in reply to S_Iftekhar
The result you got by sorting is not random.

However, I will change the subject slightly as this type of sorting is most often, for me anyway, desired for SAS variable names that are members of an enumerated variable list. The answer to easy sorting is PROC CONTENTS.

[pre]
data test;
attrib Drv10 Drv1 Drv2 DRv11 M1 M10 M11 M2 length=8;
stop;
call missing(of _all_);
run;
proc contents noprint data=test order=ignorecase out=contents(keep=name);
run;
proc print;
run;
[/pre]

The result of the option IGNORECASE is demonstrated by DRv11. In otherwords proc contents knows all about how to sort variable names "the right way".

[pre]
Obs NAME

1 Drv1
2 Drv2
3 Drv10
4 DRv11
5 M1
6 M2
7 M10
8 M11
[/pre]
SAS Super FREQ
Posts: 8,868

Re: Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Posted in reply to data_null__
Hi:
And, I should have asked whether you have SAS 9.2. If so, you're in luck, because the SORTSEQ option of PROC SORT was designed to handle these kinds of situations with the data.

Specifically, if you look up SORTSEQ=LINGUISTIC, there are a LOT of options, such as NUMERIC_COLLATION which may do exactly what you want in the PROC SORT step.
http://www2.sas.com/proceedings/forum2007/297-2007.pdf (see page 14)
http://support.sas.com/kb/31/369.html
http://support.sas.com/resources/papers/linguistic_collation.pdf

cynthia
New Contributor
Posts: 4

Re: Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Posted in reply to Cynthia_sas
Thanks. I am using SAS 9.1.3.


I have successfully solved the problem using anydigit function, using the following commands

char=input(substr(string,1,(anydigit(string)-1)) ,$8.);
num = input(substr(string,anydigit(string)),best.);

then,

proc sort data=test2; by char num;
run;

I figured that could be done with 'SCAN' too ....

Thanks everybody, you have been of great help.
Super Contributor
Super Contributor
Posts: 3,174

Re: Proc Sort on a String Variable - How to get 'abc2' appear before 'abc10'

Posted in reply to S_Iftekhar
FYI - using the INPUT on the character variable is superfluous - you only need to do the SUBSTR and you will get the correct result in variable CHAR.

Scott Barry
SBBWorks, Inc.
Ask a Question
Discussion stats
  • 12 replies
  • 820 views
  • 0 likes
  • 5 in conversation