Hi, I have a dataset like this:
id regimen
1 J0000, J0001, J2000
2 J0000, J0001, J2000, J1345
3 J2000, J1345, Q1111
4 J2000, Q1111, J1345, Q1112
How can I sort each line, base on alphabet and numeric order?
The result be like:
1 J0000, J0001, J2000
2 J0000, J0001, J1345, J2000
3 J1345, J2000, Q1111
4 J1345, J2000, Q1111, Q1112
I have a code like this:
DATA XX;
SET XX;
LENGTH newname $100;
ARRAY t(100) $ _TEMPORARY_;
CALL MISSING(OF t(*));
DO _N_=1 to LENGTH(STRIP(regimen));
t(_N_)=CHAR(regimen,_N_);
END;
CALL SORTC(of t(*));
newname=CATS(of t(*));
RUN;
But above code will combine by line all character and numeric together, while I still need the code to be ordered by X1234.
Thanks smart people!!!
So few points to check.
Example:
data have;
infile cards truncover;
input id regimen $80.;
cards;
1 J0000, J0001, J2000
2 J0000, J0001, J2000, J1345
3 J2000, J1345, Q1111
4 J2000, Q1111, J1345, Q1112
5 J1070, j1000
;;;;
data want;
set have;
array words [100] $10 _temporary_;
call missing(of words[*]);
do index=1 to countw(regimen,',');
words[index]=upcase(strip(scan(regimen,index,',')));
end;
call sortc(of words[*]);
length regimen2 $80 ;
regimen2=catx(', ',of words[*]);
drop index;
run;
Results:
Obs id regimen regimen2 1 1 J0000, J0001, J2000 J0000, J0001, J2000 2 2 J0000, J0001, J2000, J1345 J0000, J0001, J1345, J2000 3 3 J2000, J1345, Q1111 J1345, J2000, Q1111 4 4 J2000, Q1111, J1345, Q1112 J1345, J2000, Q1111, Q1112 5 5 J1070, j1000 J1000, J1070
Without the UPCASE() the last record looks like this instead:
5 5 J1070, j1000 J1070, j1000
@weijiawang wrote:
Hi, I have a dataset like this:
id regimen
1 J0000, J0001, J2000
2 J0000, J0001, J2000, J1345
3 J2000, J1345, Q1111
4 J2000, Q1111, J1345, Q1112
How can I sort each line, base on alphabet and numeric order?
The result be like:
1 J0000, J0001, J2000
2 J0000, J0001, J1345, J2000
3 J1345, J2000, Q1111
4 J1345, J2000, Q1111, Q1112
I have a code like this:
DATA XX;
SET XX;
LENGTH newname $100;
ARRAY t(100) $ _TEMPORARY_;
CALL MISSING(OF t(*));
DO _N_=1 to LENGTH(STRIP(regimen));
t(_N_)=CHAR(regimen,_N_);
END;
CALL SORTC(of t(*));
newname=CATS(of t(*));
RUN;But above code will combine by line all character and numeric together, while I still need the code to be ordered by X1234.
Thanks smart people!!!
Since you do not show anything that is X1234 I do not understand how it should be "ordered by X1234".
I don't understand why you are using CHAR since it pulls one character at a time.
I suspect that this might get closer:
DATA XX;
SET XX;
LENGTH newname $100;
ARRAY t(100) $5 _TEMPORARY_;
CALL MISSING(OF t(*));
DO N=1 to Countw(regimen,',');
t(N)=Scan(regimen,_N_,',');
END;
CALL SORTC(of t(*));
newname=CATS(of t(*));
drop n;
RUN;
Since _n_ is a SAS automatic variable and often useful for other things I don't use it for random loop counters.
Using "X1234", I meant that I need to sort on this order: first on alphabet, then on numeric after the alphabet.
For example: J0000 should come first, then J0001, then J0002....then Q0001, Q0002....then T0001, T0002.
Thank you so much for your help!!!
Transpose to a long format by using an array, then use proc sort:
data want;
set have;
do i = 1 to countw(regimen,',');
reg = scan(regimen,i,',');
output;
end;
keep id reg;
run;
proc sort data=want;
by id reg;
run;
Sorry that I was not clear about the sorting requirement. But here it is.
Using "X1234", I meant that I need to sort on this order: first on alphabet, then on numeric after the alphabet.
For example: J0000 should come first, then J0001, then J0002....then Q0001, Q0002....then T0001, T0002.
So, I am not sure if transpose the table will help?
Thank you so much for your help!!!
@weijiawang wrote:
Sorry that I was not clear about the sorting requirement. But here it is.
Using "X1234", I meant that I need to sort on this order: first on alphabet, then on numeric after the alphabet.
For example: J0000 should come first, then J0001, then J0002....then Q0001, Q0002....then T0001, T0002.
So, I am not sure if transpose the table will help?
Thank you so much for your help!!!
It does not look like you need to do anything. The character values will sort properly.
The digits used in storing numbers in strings will sort lexicographical in the same order as they would as numbers. So '0001' < '0002' < '0003'.
You would only have an issue if you wanted to treat 'J59' < 'J100'. As long as you have the leading zeros, like in your examples you will not have any issues with the sorting of the text strings.
Hi, Tom
Currently, all my values are combination of 'one alphabet'+'4 digits'.
The above code is working, except that SAS is equalling 'J9000' and 'J9070': sometimes I get results J9000,j9070. Other times, I get results J9070,J9000.
So, I believe the problem is in "CALL SORTC": it is not correctly sorting the 4 digits.
Do you have a possible solution?
Thanks
@weijiawang wrote:
Hi, Tom
Currently, all my values are combination of 'one alphabet'+'4 digits'.
The above code is working, except that SAS is equalling 'J9000' and 'J9070': sometimes I get results J9000,j9070. Other times, I get results J9070,J9000.
So, I believe the problem is in "CALL SORTC": it is not correctly sorting the 4 digits.
Do you have a possible solution?
Thanks
CALL SORTC does not care whether the characters are letters or digits. It will NOT treat two different strings as equal. It will sort all lower case letters after all uppercase letters. The order is based on the ASCII codes used to store the characters.
If you want the order to be case insensitive you will need to either change the way you are sorting or convert the values to use consistent case. For example by using the UPCASE() function to modify the values stored in the variables.
So few points to check.
Example:
data have;
infile cards truncover;
input id regimen $80.;
cards;
1 J0000, J0001, J2000
2 J0000, J0001, J2000, J1345
3 J2000, J1345, Q1111
4 J2000, Q1111, J1345, Q1112
5 J1070, j1000
;;;;
data want;
set have;
array words [100] $10 _temporary_;
call missing(of words[*]);
do index=1 to countw(regimen,',');
words[index]=upcase(strip(scan(regimen,index,',')));
end;
call sortc(of words[*]);
length regimen2 $80 ;
regimen2=catx(', ',of words[*]);
drop index;
run;
Results:
Obs id regimen regimen2 1 1 J0000, J0001, J2000 J0000, J0001, J2000 2 2 J0000, J0001, J2000, J1345 J0000, J0001, J1345, J2000 3 3 J2000, J1345, Q1111 J1345, J2000, Q1111 4 4 J2000, Q1111, J1345, Q1112 J1345, J2000, Q1111, Q1112 5 5 J1070, j1000 J1000, J1070
Without the UPCASE() the last record looks like this instead:
5 5 J1070, j1000 J1070, j1000
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.