BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kiko
Fluorite | Level 6

Hello 

 

I am trying to subset observations that are 9 digits and in AAA-SS-XXXX format (e.g. 132980984). I found some variations including values that are fewer than 9 digits, missing (.), 0, and some values that doesn't have any posted meaning (e.g. 999999999 or 888888888) So far, I am considering anything except AAA-SS-XXXX as being unknown. Can someone help me figure out how to do this? 

 

Thank you!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
6 REPLIES 6
Reeza
Super User

What do you have so far? 

 

Are you looking for help with a regular expression or using BASE SAS functions?

Regular Expressions is likely faster, but it's not easy to understand or modify. 

ballardw
Super User

Please clarify what this means:

I am trying to subset observations that are 9 digits and in AAA-SS-XXXX format (e.g. 132980984).

 

I would expect the example to look like 132-98-0984 from your "format" comment. By anychance has the SSN format been assigned to this variable?

Is this variable character or numeric? Your statement of values of . implies numeric but I've been fooled before.

 

You might post some example values and what you want as a result for those.

 

If some values are out of expected ranges such as your 999999999 then you should provide expected ranges.

Kiko
Fluorite | Level 6

Hello- 

Sorry for the confusion- yes, they are SSN and it is numeric. Here are a few examples: 

209876896

498304981

 54376548

   4326583

 

I basically want to grab every observation that is 9 digits thus exclude the followings: 

0 / '.'/ 999999999/ 888888888 

 

I also found a few cases that are 9 digits, but start with 888 which I am not sure how to deal with, but would like to include them for now. 

 

Thank you! 

 

ballardw
Super User

In a data step:

 

if variablename in (. 0 99999999 888888888) then <do what ever>.

If you are actually creating a subset data with these values then

 

data want;

   set have;

   if variablename in (. 0 99999999 888888888);

run;

 

to exclude:

data want;

   set have;

   if variablename not in (. 0 99999999 888888888);

run;

 

Or use a dataset option where clause (where=( variablename not in (. 0 99999999 888888888))) if you don't want to create a separate data set but use the same data set for analysis.

 

Reeza
Super User
Search LexJansen.com for example macros/code:
http://analytics.ncsu.edu/sesug/2007/PO23.pdf

Kiko
Fluorite | Level 6

Thank you both! super helpful 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1551 views
  • 0 likes
  • 3 in conversation