BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Sam20001
Fluorite | Level 6

Hello, 

 

I would like to know how I can separate part of a variable. The values for the variable named ID look like this: AAID1248987441122-998-IDSID-0-1. I need to only extract the 13 digits after the letter D and before the first hyphen (i.e., 1248987441122. the new variable name will be sid). I have tried scan and compress but neither produces what I need. 

 

sid = scan(ID,4) results in a value of only 1 

 

sid = compress(ID, """,'A') results in a value like this 1248987441122-998-0-1

 

Your help would be greatly appreciated. 

 

Thank you.

1 ACCEPTED SOLUTION

Accepted Solutions
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Sam20001 

 

You could also use the PRXCHANGE function. It is more flexible, so it can be coded to handle many different input formats. The learning courve is a bit steep if you are unfamiliar with the SAS PRX functions, and I recomment the PRX Tip sheet as a great way of getting started: https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf

 

In your case, the string seems to be machine generated, so a flexible solution is not called for. Other contributors have suggested different solutions that work perfectily well on your example string, so I am just beating the drum over the use of PRX functions; they have saved me the trouble of coding many lines of complicated code over the years. They are known to be ineffctive, but in my experience it's not a problem worth considering unless input is counted in millions of observations.

 

data have;
  length ID $40;
  input ID $char40.;
  datalines;
AAID1248987441122-998-IDSID-0-1
ID1248987441122-998--0-1
1248987441122
123
AAID1-998-IDSID-0-1
AAIN1248987441122_998-IDSID-0-1
AAID-1248987441122DSID-0-1
;
data want;
  set have;
  SID = prxchange('s/(\D+)(\d*)(\D.*)/$2/',-1,ID);

  * other suggestions in this post;
  SID2 = compress(scan(ID, 1, "-"), , 'kd');
  SID3 = scan(scan(ID,1,'-'),-2,'kd');
  SID4 = scan(ID,2,'D');
  SID5 = scan(scan(ID,2,'D'),1,'-');
run;

 

View solution in original post

5 REPLIES 5
Reeza
Super User
SID = compress(scan(ID, 1, "-"), , 'kd');

Use both COMPRESS + SCAN.

SCAN to get the first portion of the text and COMPRESS to remove the first 4 characters.

Or if the text is always a fixed length use SUBSTR.

SID = substr(ID, 5, 13);
Patrick
Opal | Level 21

One or both of below two options should work.

data sample;
  infile datalines truncover;
  input have :$31.;
  length want1 want2 $13;
  want1=scan(scan(have,1,'-'),-2,'kd');
  /* if the wanted digits are always on the same position and it's always 13 digits */ 
  /*  then substr() will work as well */
  want2=substr(have,5,13);
  datalines;
AAID1248987441122-998-IDSID-0-1
 
AAID99999999-998-IDSID-0-1
;

proc print data=sample;
run;

Patrick_0-1673665811767.png

s_lassen
Meteorite | Level 14

You can try something like this:

data want;
  set have;
  length sid $13;
  sid=scan(ID,2,'D');
run;

That will work if you always have 13 digits after the "D". If the digit string is sometimes shorter (so that you get the hyphen and possibly other stuff in the SID variable, you could change it to:

data want;
  set have;
  length sid $13;
  sid=scan(scan(ID,2,'D'),1,'-');
run;
ErikLund_Jensen
Rhodochrosite | Level 12

Hi @Sam20001 

 

You could also use the PRXCHANGE function. It is more flexible, so it can be coded to handle many different input formats. The learning courve is a bit steep if you are unfamiliar with the SAS PRX functions, and I recomment the PRX Tip sheet as a great way of getting started: https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf

 

In your case, the string seems to be machine generated, so a flexible solution is not called for. Other contributors have suggested different solutions that work perfectily well on your example string, so I am just beating the drum over the use of PRX functions; they have saved me the trouble of coding many lines of complicated code over the years. They are known to be ineffctive, but in my experience it's not a problem worth considering unless input is counted in millions of observations.

 

data have;
  length ID $40;
  input ID $char40.;
  datalines;
AAID1248987441122-998-IDSID-0-1
ID1248987441122-998--0-1
1248987441122
123
AAID1-998-IDSID-0-1
AAIN1248987441122_998-IDSID-0-1
AAID-1248987441122DSID-0-1
;
data want;
  set have;
  SID = prxchange('s/(\D+)(\d*)(\D.*)/$2/',-1,ID);

  * other suggestions in this post;
  SID2 = compress(scan(ID, 1, "-"), , 'kd');
  SID3 = scan(scan(ID,1,'-'),-2,'kd');
  SID4 = scan(ID,2,'D');
  SID5 = scan(scan(ID,2,'D'),1,'-');
run;

 

Sam20001
Fluorite | Level 6

All the examples work. Thank you so much for your help with this! 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 754 views
  • 5 likes
  • 5 in conversation