BookmarkSubscribeRSS Feed
CharlotteCain
Quartz | Level 8

Dear Folks,

I am in need of a best function logic also aiming at performance for this variable extraction. Considering the dataset is large with 18 million plus records, a solution aimed at best performance would be most appreciated. I have the requirement as follows-

My Input has a variable ID as follows-

  ID                                                 ORDER (My requirement)

1 hkjhbjbh                                              1

2 hgjkhkjhb                                             2

3 hklhnklh                                              3

4 hnkljnlk                                               4

x1 kjhbjkhb                                            1

x2 hklhjlkjl                                              2

x3 jljmkljml                                             3

y1 bkbhjk                                               1

y2 hnkljnkl                                              2

y3 nklnkl                                                3

a1 kjnkl                                                  1

a2 jjlj                                                      2

u1 jkhkjklkl                                             1

u2 hlkhjklhk                                            2

tyytuiyt                                                  1

tuyiuyiu                                                  2

You would notice the order variable needs to be extracted from the ID however the ID is not consistent as in case of X1. Also the ID's that does not have numbers are to be assigned directly. Scan and Substr will do? if so how?

Many Thanks,

Charlotte from England

4 REPLIES 4
Amir
PROC Star

Hi,

Can the number you want to extract be any number of digits? E.g., is "1234abcd" to give "1234" or something else?

Will there ever be digits separated by letters and if so what would result do you want? E.g., is "abc1def2gh" to give "1" or "12" or "2" or something else?

If there is a number will it only ever be in the first two bytes of the id or could it be in any position?

Regards,

Amir.

CharlotteCain
Quartz | Level 8

Hi,

Thanks for the response. NO, the number i want to extract is not any number of digits. For what I could see, I have not noticed more than 9, which means the display order I am assuming is only between 1 and 9. Well, I have not scrolled down my dataset to check if its more than the range between 1 and 9. If it goes over, I guess your question makes complete sense, however I guess not. As far as the position of the number is concerned, it is exactly the same as I wrote in the question.  Nevertheless, it would be just great if there's an solution that can derive order number from anywhere within the variable value, but I guess if that was the case the data pattern  here would be very awful for me to comprehend at least. I would think it isn't that varied.

Charlotte

Amir
PROC Star

Hi,

You could try the following which should extract the number from any position. I have assumed "order" is a numeric variable.

data have;

  input id $char15.;

  datalines;

1 hkjhbjbh 

2 hgjkhkjhb

3 hklhnklh 

4 hnkljnlk 

x1 kjhbjkhb

x2 hklhjlkjl

x3 jljmkljml

y1 bkbhjk  

y2 hnkljnkl

y3 nklnkl  

a1 kjnkl   

a2 jjlj    

u1 jkhkjklkl

u2 hlkhjklhk

tyytuiyt   

tuyiuyiu   

;

data want(drop=neworder);

  set have;

  retain neworder 0;

  order=input(compress(id,,'dk'),8.);

  if order=. then

  do;

    neworder+1;

    order=neworder;

  end;

run;

Regards,

Amir.

Message was edited by: Amir Malik - format some code

AhmedAl_Attar
Ammonite | Level 13

Charlotte,

See if this does what you want,

data test(keep=id order);

    length id $15 order neworder 4;

    input id $15.;

    retain neworder;

    dp=anydigit(id,1);

    if (dp > 0) then

    do;

        cp=notdigit(id,dp);

        order=input(substr(id,dp,(cp-dp)),4.);

    end;

    else

    do;

        neworder+1;

        order=neworder;

    end;

datalines;

1 hkjhbjbh

2 hgjkhkjhb

3 hklhnklh

4 hnkljnlk

x1 kjhbjkhb

x2 hklhjlkjl

x3 jljmkljml

y1 bkbhjk

y2 hnkljnkl

y3 nklnkl

a1 kjnkl

a2 jjlj

u1 jkhkjklkl

u2 hlkhjklhk

tyytuiyt

tuyiuyiu

;

run;

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1354 views
  • 6 likes
  • 3 in conversation