BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Kiko
Fluorite | Level 6

Hello 

 

I am trying to subset observations that are 9 digits and in AAA-SS-XXXX format (e.g. 132980984). I found some variations including values that are fewer than 9 digits, missing (.), 0, and some values that doesn't have any posted meaning (e.g. 999999999 or 888888888) So far, I am considering anything except AAA-SS-XXXX as being unknown. Can someone help me figure out how to do this? 

 

Thank you!

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User
6 REPLIES 6
Reeza
Super User

What do you have so far? 

 

Are you looking for help with a regular expression or using BASE SAS functions?

Regular Expressions is likely faster, but it's not easy to understand or modify. 

ballardw
Super User

Please clarify what this means:

I am trying to subset observations that are 9 digits and in AAA-SS-XXXX format (e.g. 132980984).

 

I would expect the example to look like 132-98-0984 from your "format" comment. By anychance has the SSN format been assigned to this variable?

Is this variable character or numeric? Your statement of values of . implies numeric but I've been fooled before.

 

You might post some example values and what you want as a result for those.

 

If some values are out of expected ranges such as your 999999999 then you should provide expected ranges.

Kiko
Fluorite | Level 6

Hello- 

Sorry for the confusion- yes, they are SSN and it is numeric. Here are a few examples: 

209876896

498304981

 54376548

   4326583

 

I basically want to grab every observation that is 9 digits thus exclude the followings: 

0 / '.'/ 999999999/ 888888888 

 

I also found a few cases that are 9 digits, but start with 888 which I am not sure how to deal with, but would like to include them for now. 

 

Thank you! 

 

ballardw
Super User

In a data step:

 

if variablename in (. 0 99999999 888888888) then <do what ever>.

If you are actually creating a subset data with these values then

 

data want;

   set have;

   if variablename in (. 0 99999999 888888888);

run;

 

to exclude:

data want;

   set have;

   if variablename not in (. 0 99999999 888888888);

run;

 

Or use a dataset option where clause (where=( variablename not in (. 0 99999999 888888888))) if you don't want to create a separate data set but use the same data set for analysis.

 

Reeza
Super User
Search LexJansen.com for example macros/code:
http://analytics.ncsu.edu/sesug/2007/PO23.pdf

Kiko
Fluorite | Level 6

Thank you both! super helpful 

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 559 views
  • 0 likes
  • 3 in conversation