Subsetting data

Accepted Solution Solved
Reply
Contributor
Posts: 27
Accepted Solution

Subsetting data

Hello 

 

I am trying to subset observations that are 9 digits and in AAA-SS-XXXX format (e.g. 132980984). I found some variations including values that are fewer than 9 digits, missing (.), 0, and some values that doesn't have any posted meaning (e.g. 999999999 or 888888888) So far, I am considering anything except AAA-SS-XXXX as being unknown. Can someone help me figure out how to do this? 

 

Thank you!

 

 


Accepted Solutions
Solution
‎09-14-2017 11:45 AM
Super User
Posts: 23,262

Re: Subsetting data


All Replies
Super User
Posts: 23,262

Re: Subsetting data

What do you have so far? 

 

Are you looking for help with a regular expression or using BASE SAS functions?

Regular Expressions is likely faster, but it's not easy to understand or modify. 

Super User
Posts: 13,304

Re: Subsetting data

Please clarify what this means:

I am trying to subset observations that are 9 digits and in AAA-SS-XXXX format (e.g. 132980984).

 

I would expect the example to look like 132-98-0984 from your "format" comment. By anychance has the SSN format been assigned to this variable?

Is this variable character or numeric? Your statement of values of . implies numeric but I've been fooled before.

 

You might post some example values and what you want as a result for those.

 

If some values are out of expected ranges such as your 999999999 then you should provide expected ranges.

Contributor
Posts: 27

Re: Subsetting data

Hello- 

Sorry for the confusion- yes, they are SSN and it is numeric. Here are a few examples: 

209876896

498304981

 54376548

   4326583

 

I basically want to grab every observation that is 9 digits thus exclude the followings: 

0 / '.'/ 999999999/ 888888888 

 

I also found a few cases that are 9 digits, but start with 888 which I am not sure how to deal with, but would like to include them for now. 

 

Thank you! 

 

Super User
Posts: 13,304

Re: Subsetting data

In a data step:

 

if variablename in (. 0 99999999 888888888) then <do what ever>.

If you are actually creating a subset data with these values then

 

data want;

   set have;

   if variablename in (. 0 99999999 888888888);

run;

 

to exclude:

data want;

   set have;

   if variablename not in (. 0 99999999 888888888);

run;

 

Or use a dataset option where clause (where=( variablename not in (. 0 99999999 888888888))) if you don't want to create a separate data set but use the same data set for analysis.

 

Solution
‎09-14-2017 11:45 AM
Super User
Posts: 23,262

Re: Subsetting data

Search LexJansen.com for example macros/code:
http://analytics.ncsu.edu/sesug/2007/PO23.pdf

Contributor
Posts: 27

Re: Subsetting data

Thank you both! super helpful 

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 150 views
  • 0 likes
  • 3 in conversation