Dear Folks,
I am in need of a best function logic also aiming at performance for this variable extraction. Considering the dataset is large with 18 million plus records, a solution aimed at best performance would be most appreciated. I have the requirement as follows-
My Input has a variable ID as follows-
ID ORDER (My requirement)
1 hkjhbjbh 1
2 hgjkhkjhb 2
3 hklhnklh 3
4 hnkljnlk 4
x1 kjhbjkhb 1
x2 hklhjlkjl 2
x3 jljmkljml 3
y1 bkbhjk 1
y2 hnkljnkl 2
y3 nklnkl 3
a1 kjnkl 1
a2 jjlj 2
u1 jkhkjklkl 1
u2 hlkhjklhk 2
tyytuiyt 1
tuyiuyiu 2
You would notice the order variable needs to be extracted from the ID however the ID is not consistent as in case of X1. Also the ID's that does not have numbers are to be assigned directly. Scan and Substr will do? if so how?
Many Thanks,
Charlotte from England
Hi,
Can the number you want to extract be any number of digits? E.g., is "1234abcd" to give "1234" or something else?
Will there ever be digits separated by letters and if so what would result do you want? E.g., is "abc1def2gh" to give "1" or "12" or "2" or something else?
If there is a number will it only ever be in the first two bytes of the id or could it be in any position?
Regards,
Amir.
Hi,
Thanks for the response. NO, the number i want to extract is not any number of digits. For what I could see, I have not noticed more than 9, which means the display order I am assuming is only between 1 and 9. Well, I have not scrolled down my dataset to check if its more than the range between 1 and 9. If it goes over, I guess your question makes complete sense, however I guess not. As far as the position of the number is concerned, it is exactly the same as I wrote in the question. Nevertheless, it would be just great if there's an solution that can derive order number from anywhere within the variable value, but I guess if that was the case the data pattern here would be very awful for me to comprehend at least. I would think it isn't that varied.
Charlotte
Hi,
You could try the following which should extract the number from any position. I have assumed "order" is a numeric variable.
data have;
input id $char15.;
datalines;
1 hkjhbjbh
2 hgjkhkjhb
3 hklhnklh
4 hnkljnlk
x1 kjhbjkhb
x2 hklhjlkjl
x3 jljmkljml
y1 bkbhjk
y2 hnkljnkl
y3 nklnkl
a1 kjnkl
a2 jjlj
u1 jkhkjklkl
u2 hlkhjklhk
tyytuiyt
tuyiuyiu
;
data want(drop=neworder);
set have;
retain neworder 0;
order=input(compress(id,,'dk'),8.);
if order=. then
do;
neworder+1;
order=neworder;
end;
run;
Regards,
Amir.
Message was edited by: Amir Malik - format some code
Charlotte,
See if this does what you want,
data test(keep=id order);
length id $15 order neworder 4;
input id $15.;
retain neworder;
dp=anydigit(id,1);
if (dp > 0) then
do;
cp=notdigit(id,dp);
order=input(substr(id,dp,(cp-dp)),4.);
end;
else
do;
neworder+1;
order=neworder;
end;
datalines;
1 hkjhbjbh
2 hgjkhkjhb
3 hklhnklh
4 hnkljnlk
x1 kjhbjkhb
x2 hklhjlkjl
x3 jljmkljml
y1 bkbhjk
y2 hnkljnkl
y3 nklnkl
a1 kjnkl
a2 jjlj
u1 jkhkjklkl
u2 hlkhjklhk
tyytuiyt
tuyiuyiu
;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.