BookmarkSubscribeRSS Feed
CharlotteCain
Quartz | Level 8

Dear Folks,

I am in need of a best function logic also aiming at performance for this variable extraction. Considering the dataset is large with 18 million plus records, a solution aimed at best performance would be most appreciated. I have the requirement as follows-

My Input has a variable ID as follows-

  ID                                                 ORDER (My requirement)

1 hkjhbjbh                                              1

2 hgjkhkjhb                                             2

3 hklhnklh                                              3

4 hnkljnlk                                               4

x1 kjhbjkhb                                            1

x2 hklhjlkjl                                              2

x3 jljmkljml                                             3

y1 bkbhjk                                               1

y2 hnkljnkl                                              2

y3 nklnkl                                                3

a1 kjnkl                                                  1

a2 jjlj                                                      2

u1 jkhkjklkl                                             1

u2 hlkhjklhk                                            2

tyytuiyt                                                  1

tuyiuyiu                                                  2

You would notice the order variable needs to be extracted from the ID however the ID is not consistent as in case of X1. Also the ID's that does not have numbers are to be assigned directly. Scan and Substr will do? if so how?

Many Thanks,

Charlotte from England

4 REPLIES 4
Amir
PROC Star

Hi,

Can the number you want to extract be any number of digits? E.g., is "1234abcd" to give "1234" or something else?

Will there ever be digits separated by letters and if so what would result do you want? E.g., is "abc1def2gh" to give "1" or "12" or "2" or something else?

If there is a number will it only ever be in the first two bytes of the id or could it be in any position?

Regards,

Amir.

CharlotteCain
Quartz | Level 8

Hi,

Thanks for the response. NO, the number i want to extract is not any number of digits. For what I could see, I have not noticed more than 9, which means the display order I am assuming is only between 1 and 9. Well, I have not scrolled down my dataset to check if its more than the range between 1 and 9. If it goes over, I guess your question makes complete sense, however I guess not. As far as the position of the number is concerned, it is exactly the same as I wrote in the question.  Nevertheless, it would be just great if there's an solution that can derive order number from anywhere within the variable value, but I guess if that was the case the data pattern  here would be very awful for me to comprehend at least. I would think it isn't that varied.

Charlotte

Amir
PROC Star

Hi,

You could try the following which should extract the number from any position. I have assumed "order" is a numeric variable.

data have;

  input id $char15.;

  datalines;

1 hkjhbjbh 

2 hgjkhkjhb

3 hklhnklh 

4 hnkljnlk 

x1 kjhbjkhb

x2 hklhjlkjl

x3 jljmkljml

y1 bkbhjk  

y2 hnkljnkl

y3 nklnkl  

a1 kjnkl   

a2 jjlj    

u1 jkhkjklkl

u2 hlkhjklhk

tyytuiyt   

tuyiuyiu   

;

data want(drop=neworder);

  set have;

  retain neworder 0;

  order=input(compress(id,,'dk'),8.);

  if order=. then

  do;

    neworder+1;

    order=neworder;

  end;

run;

Regards,

Amir.

Message was edited by: Amir Malik - format some code

AhmedAl_Attar
Rhodochrosite | Level 12

Charlotte,

See if this does what you want,

data test(keep=id order);

    length id $15 order neworder 4;

    input id $15.;

    retain neworder;

    dp=anydigit(id,1);

    if (dp > 0) then

    do;

        cp=notdigit(id,dp);

        order=input(substr(id,dp,(cp-dp)),4.);

    end;

    else

    do;

        neworder+1;

        order=neworder;

    end;

datalines;

1 hkjhbjbh

2 hgjkhkjhb

3 hklhnklh

4 hnkljnlk

x1 kjhbjkhb

x2 hklhjlkjl

x3 jljmkljml

y1 bkbhjk

y2 hnkljnkl

y3 nklnkl

a1 kjnkl

a2 jjlj

u1 jkhkjklkl

u2 hlkhjklhk

tyytuiyt

tuyiuyiu

;

run;

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 512 views
  • 6 likes
  • 3 in conversation