☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
Calcite | Level 5

## How to convert long numeric ID into multiple variables

I have a data set with IDs like: 100101, 100102, 201201. The first two numbers indicate location, the second two numbers indicate a family group, and the last two numbers indicate the individual within that location and family group. I need to break these up into variables for the family.

Right now I have this:

data new;
set original;
if 100100<id<100200 then family=1;
if 100200<id<100300 then family=2;
if 100300<id<100400 then family=3;

etc.

run;

I have 2,000 observations and hundreds of families so this method is going to take forever. I'm not very skilled at macros yet, is there a way I can code a macro to process this for me?

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: How to convert long numeric ID into multiple variables

``````length location family individual \$2 int \$6;
int = put(id,z6.);
location = substr(int,1,2);
family = substr(int,3,2);
individual = substr(int,5,2);
drop int;``````
5 REPLIES 5
Diamond | Level 26

## Re: How to convert long numeric ID into multiple variables

Macros not needed here. This can be done with data step statements.

However, you say there are "hundreds of families", what would say the 157th family look like in this ID variable? You also say the family number is two digits, so it can't be 157. Please explain this further.

Also, is the variable named ID numeric or character according to PROC CONTENTS?

--
Paige Miller
Calcite | Level 5

## Re: How to convert long numeric ID into multiple variables

The families reset with each new group, so each group has ~20 families, each labeled from 1-20.

ID variable is numeric.

Super User

## Re: How to convert long numeric ID into multiple variables

@taylorcclarkson wrote:

The families reset with each new group, so each group has ~20 families, each labeled from 1-20.

ID variable is numeric.

I would say that extracting just the family from such a description would be incomplete. Unless you have already figured out and added a GROUP variable.

Note: you could likely keep all of the group/family together just using a custom format

```data example;
input id;
datalines;
100101
100102
100103
100201
100202
100203
100204
110101
110102
110201
110202
110203
;

proc format;
picture grpfam
low-high ='9999' (mult=0.01)
;

proc print data=example;
format id grpfam.;
run;```

The above custom format only displays the first 4 digits of the id. (it shifts the decimal place of the ID variable with the mult and then doesn't display any decimal values by default).

The groups created by formats are honored by most procedures so you can do things like:

```proc freq data=example;
tables id;
format id grpfam.;
run;```

to count family members.

Or to use the grp/family as a group in a regression or similar procedure.

Since formats allow you to work with the formatted values in most places they can be very efficient ways to make different tables, analyses or graphs without having to go back and keep adding variables to the data set.

And  just changing the digit selectors in the format:

```proc format;
picture fam
low-high ='99' (mult=0.01)
;
```

will display the 2 digits of the family.

Super User

## Re: How to convert long numeric ID into multiple variables

``````length location family individual \$2 int \$6;
int = put(id,z6.);
location = substr(int,1,2);
family = substr(int,3,2);
individual = substr(int,5,2);
drop int;``````
Super User

## Re: How to convert long numeric ID into multiple variables

You can also just use arithmetic to pull out the values.

``````data new;
set original;
individual = mod(id,100);
family = int(id/100);
group = int(family/100);
family = mod(family,100);
run;
``````
Discussion stats
• 5 replies
• 518 views
• 0 likes
• 5 in conversation