BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
taylorcclarkson
Calcite | Level 5

I have a data set with IDs like: 100101, 100102, 201201. The first two numbers indicate location, the second two numbers indicate a family group, and the last two numbers indicate the individual within that location and family group. I need to break these up into variables for the family. 

Right now I have this: 

 

data new;
set original;
if 100100<id<100200 then family=1;
if 100200<id<100300 then family=2;
if 100300<id<100400 then family=3;

etc. 

run;

 

I have 2,000 observations and hundreds of families so this method is going to take forever. I'm not very skilled at macros yet, is there a way I can code a macro to process this for me? 

1 ACCEPTED SOLUTION
5 REPLIES 5
PaigeMiller
Diamond | Level 26

Macros not needed here. This can be done with data step statements.

 

However, you say there are "hundreds of families", what would say the 157th family look like in this ID variable? You also say the family number is two digits, so it can't be 157. Please explain this further.

 

Also, is the variable named ID numeric or character according to PROC CONTENTS?

 

--
Paige Miller
taylorcclarkson
Calcite | Level 5

The families reset with each new group, so each group has ~20 families, each labeled from 1-20. 

ID variable is numeric. 

ballardw
Super User

@taylorcclarkson wrote:

The families reset with each new group, so each group has ~20 families, each labeled from 1-20. 

ID variable is numeric. 


I would say that extracting just the family from such a description would be incomplete. Unless you have already figured out and added a GROUP variable.

 

Note: you could likely keep all of the group/family together just using a custom format

data example;
   input id;
datalines;
100101
100102
100103
100201
100202
100203
100204
110101
110102
110201
110202
110203
;

proc format;
picture grpfam 
low-high ='9999' (mult=0.01)
;

proc print data=example;
   format id grpfam.;
run;

The above custom format only displays the first 4 digits of the id. (it shifts the decimal place of the ID variable with the mult and then doesn't display any decimal values by default).

 

The groups created by formats are honored by most procedures so you can do things like:

proc freq data=example;
   tables id;
   format id grpfam.;
run;

to count family members.

Or to use the grp/family as a group in a regression or similar procedure.

 

Since formats allow you to work with the formatted values in most places they can be very efficient ways to make different tables, analyses or graphs without having to go back and keep adding variables to the data set.

 

And  just changing the digit selectors in the format:

proc format;
picture fam 
low-high ='99' (mult=0.01)
;

will display the 2 digits of the family.

 

Tom
Super User Tom
Super User

You can also just use arithmetic to pull out the values.

data new;
  set original;
  individual = mod(id,100);
  family = int(id/100);
  group = int(family/100);
  family = mod(family,100);
run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 528 views
  • 0 likes
  • 5 in conversation