BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
weijiawang
Fluorite | Level 6

Hi, I have a dataset like this:

id regimen

J0000, J0001, J2000

2 J0000, J0001, J2000, J1345

3 J2000, J1345, Q1111

4 J2000, Q1111, J1345, Q1112

 

How can I sort each line, base on alphabet and numeric order?

The result be like:

J0000, J0001, J2000

2 J0000, J0001, J1345, J2000

3 J1345, J2000, Q1111

4 J1345, J2000, Q1111, Q1112

 

I have a code like this:

DATA XX;
SET XX;
LENGTH newname $100;
ARRAY t(100) $ _TEMPORARY_;
CALL MISSING(OF t(*));
DO _N_=1 to LENGTH(STRIP(regimen));
t(_N_)=CHAR(regimen,_N_);
END;
CALL SORTC(of t(*));
newname=CATS(of t(*));
RUN;

But above code will combine by line all character and numeric together, while I still need the code to be ordered by X1234.

 

Thanks smart people!!!

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

So few points to check.

  • Make sure to eliminate leading spaces when parsing the words from the list
  • If you want them to ignore case then force the letters to uppercase
  • Make sure to clear your temporary array, values are retained unless cleared.  Or use non-temporary variables.

Example:

data have;
  infile cards truncover;
  input id regimen $80.;
cards;
1 J0000, J0001, J2000
2 J0000, J0001, J2000, J1345
3 J2000, J1345, Q1111
4 J2000, Q1111, J1345, Q1112
5 J1070, j1000
;;;;

data want;
  set have;
  array words [100] $10 _temporary_;
  call missing(of words[*]);
  do index=1 to countw(regimen,',');
    words[index]=upcase(strip(scan(regimen,index,',')));
  end;
  call sortc(of words[*]);
  length regimen2 $80 ;
  regimen2=catx(', ',of words[*]);
  drop index;
run;

Results:

Obs    id    regimen                       regimen2

 1      1    J0000, J0001, J2000           J0000, J0001, J2000
 2      2    J0000, J0001, J2000, J1345    J0000, J0001, J1345, J2000
 3      3    J2000, J1345, Q1111           J1345, J2000, Q1111
 4      4    J2000, Q1111, J1345, Q1112    J1345, J2000, Q1111, Q1112
 5      5    J1070, j1000                  J1000, J1070

Without the UPCASE() the last record looks like this instead:

 5      5    J1070, j1000                  J1070, j1000

View solution in original post

9 REPLIES 9
ballardw
Super User

@weijiawang wrote:

Hi, I have a dataset like this:

id regimen

J0000, J0001, J2000

2 J0000, J0001, J2000, J1345

3 J2000, J1345, Q1111

4 J2000, Q1111, J1345, Q1112

 

How can I sort each line, base on alphabet and numeric order?

The result be like:

J0000, J0001, J2000

2 J0000, J0001, J1345, J2000

3 J1345, J2000, Q1111

4 J1345, J2000, Q1111, Q1112

 

I have a code like this:

DATA XX;
SET XX;
LENGTH newname $100;
ARRAY t(100) $ _TEMPORARY_;
CALL MISSING(OF t(*));
DO _N_=1 to LENGTH(STRIP(regimen));
t(_N_)=CHAR(regimen,_N_);
END;
CALL SORTC(of t(*));
newname=CATS(of t(*));
RUN;

But above code will combine by line all character and numeric together, while I still need the code to be ordered by X1234.

 

Thanks smart people!!!


Since you do not show anything that is X1234 I do not understand how it should be "ordered by X1234".

 

I don't understand why you are using CHAR since it pulls one character at a time.

I suspect that this might get closer:

DATA XX;
	SET XX;
	LENGTH newname $100;
	ARRAY t(100) $5 _TEMPORARY_;
	CALL MISSING(OF t(*));
	DO N=1 to Countw(regimen,',');
		t(N)=Scan(regimen,_N_,',');
	END;
	CALL SORTC(of t(*));
	newname=CATS(of t(*));
       drop n;
RUN;

Since _n_ is a  SAS automatic variable and often useful for other things I don't use it for random loop counters.

weijiawang
Fluorite | Level 6

Using "X1234", I meant that I need to sort on this order: first on alphabet, then on numeric after the alphabet.

For example: J0000 should come first, then J0001, then J0002....then Q0001, Q0002....then T0001, T0002.

 

Thank you so much for your help!!!

Kurt_Bremser
Super User

Transpose to a long format by using an array, then use proc sort: 

data want;
set have;
do i = 1 to countw(regimen,',');
  reg = scan(regimen,i,',');
  output;
end;
keep id reg;
run;

proc sort data=want;
by id reg;
run;
weijiawang
Fluorite | Level 6

Sorry that I was not clear about the sorting requirement. But here it is.

 

Using "X1234", I meant that I need to sort on this order: first on alphabet, then on numeric after the alphabet.

For example: J0000 should come first, then J0001, then J0002....then Q0001, Q0002....then T0001, T0002.

 

So, I am not sure if transpose the table will help?

 

Thank you so much for your help!!!

Tom
Super User Tom
Super User

@weijiawang wrote:

Sorry that I was not clear about the sorting requirement. But here it is.

 

Using "X1234", I meant that I need to sort on this order: first on alphabet, then on numeric after the alphabet.

For example: J0000 should come first, then J0001, then J0002....then Q0001, Q0002....then T0001, T0002.

 

So, I am not sure if transpose the table will help?

 

Thank you so much for your help!!!


It does not look like you need to do anything.  The character values will sort properly.

 

The digits used in storing numbers in strings will sort lexicographical in the same order as they would as numbers.  So '0001' < '0002' < '0003'.

 

You would only have an issue if you wanted to treat 'J59' < 'J100'.  As long as you have the leading zeros, like in your examples you will not have any issues with the sorting of the text strings.

weijiawang
Fluorite | Level 6

Hi, Tom

 

Currently, all my values are combination of 'one alphabet'+'4 digits'.

The above code is working, except that SAS is equalling 'J9000' and 'J9070': sometimes I get results J9000,j9070. Other times, I get results J9070,J9000.

 

So, I believe the problem is in "CALL SORTC": it is not correctly sorting the 4 digits. 

Do you have a possible solution?

 

Thanks

Tom
Super User Tom
Super User

@weijiawang wrote:

Hi, Tom

 

Currently, all my values are combination of 'one alphabet'+'4 digits'.

The above code is working, except that SAS is equalling 'J9000' and 'J9070': sometimes I get results J9000,j9070. Other times, I get results J9070,J9000.

 

So, I believe the problem is in "CALL SORTC": it is not correctly sorting the 4 digits. 

Do you have a possible solution?

 

Thanks


CALL SORTC does not care whether the characters are letters or digits.  It will NOT treat two different strings as equal.  It will sort all lower case letters after all uppercase letters.  The order is based on the ASCII codes used to store the characters. 

 

If you want the order to be case insensitive you will need to either change the way you are sorting or convert the values to use consistent case.  For example by using the UPCASE() function to modify the values stored in the variables.

Tom
Super User Tom
Super User

So few points to check.

  • Make sure to eliminate leading spaces when parsing the words from the list
  • If you want them to ignore case then force the letters to uppercase
  • Make sure to clear your temporary array, values are retained unless cleared.  Or use non-temporary variables.

Example:

data have;
  infile cards truncover;
  input id regimen $80.;
cards;
1 J0000, J0001, J2000
2 J0000, J0001, J2000, J1345
3 J2000, J1345, Q1111
4 J2000, Q1111, J1345, Q1112
5 J1070, j1000
;;;;

data want;
  set have;
  array words [100] $10 _temporary_;
  call missing(of words[*]);
  do index=1 to countw(regimen,',');
    words[index]=upcase(strip(scan(regimen,index,',')));
  end;
  call sortc(of words[*]);
  length regimen2 $80 ;
  regimen2=catx(', ',of words[*]);
  drop index;
run;

Results:

Obs    id    regimen                       regimen2

 1      1    J0000, J0001, J2000           J0000, J0001, J2000
 2      2    J0000, J0001, J2000, J1345    J0000, J0001, J1345, J2000
 3      3    J2000, J1345, Q1111           J1345, J2000, Q1111
 4      4    J2000, Q1111, J1345, Q1112    J1345, J2000, Q1111, Q1112
 5      5    J1070, j1000                  J1000, J1070

Without the UPCASE() the last record looks like this instead:

 5      5    J1070, j1000                  J1070, j1000

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 9 replies
  • 2647 views
  • 5 likes
  • 4 in conversation