BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
AVA_16
Obsidian | Level 7

I have a dataset with 4500 observations  with variables family_ids, patient_id, Diabetes (1/0), age, sex, BMI

Some of the patients do not belong to any family and have not been assigned any family id. for e.g, the data is like:

 

Family_ids patient

 1
 3
 4
 6
 9
120
121
190

 

There are 191 patients  with no family id. I would like to assign a family id to these patients other than the family ids already in the dataset. I would like to do so because the class identifier in my multilevel model is family_id. I would not like to delete these observations or assign a missing value to them.

 

How can it be done? Any help is appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Here is one way assuming that you have no more than 1000 actual Family Id "numbers" assigned.

Note that I treat Family_id and Patient as character variables as I would not expect to be doing arithmetic with them. If your values are actually numeric then the assignment for Family_id when missing would not involve the "put", just the Famnum variable.

 

data have;
   input Family_id $ patient $;
datalines;
.  1 
.  3 
.  4 
.  6 
.  9 
1 20 
1 21 
1 90 
. 6
;
run;


proc sort data=have;
  by patient;
run;

data want;
   set have;
   by patient;
   retain famnum 1000;
   if first.patient and missing(Family_id) then famnum+1;
   if missing(Family_id) then Family_id = put(famnum,best5. -L);
   drop famnum;
run;

An example of how to provide a data step of values.

The sort is to get all of the same patient records together, in case they do appear multiple times. I provided an example patient that appears twice.

The last data step uses the BY patient to increment the "family_id" when a patient is missing one and assign the new value.

 

Note that if you just happened to have an insurance policy number lying around that might be useable to group missing family id if they have the same policy information. Not a guarantee but one way to get somewhat better family membership for some.

View solution in original post

3 REPLIES 3
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Yes, but not enough information is provided.  Is family sequential, i.e. do you want any new ones added to the end, or fill gaps, or be completely distinct?  For instance, you could simply take max(family) then a datastep which assigns missing from max(family)+1 onwards.

 

Also, post test data in the form of a datastep, this is an essential practice in order to get good responses.

ballardw
Super User

Here is one way assuming that you have no more than 1000 actual Family Id "numbers" assigned.

Note that I treat Family_id and Patient as character variables as I would not expect to be doing arithmetic with them. If your values are actually numeric then the assignment for Family_id when missing would not involve the "put", just the Famnum variable.

 

data have;
   input Family_id $ patient $;
datalines;
.  1 
.  3 
.  4 
.  6 
.  9 
1 20 
1 21 
1 90 
. 6
;
run;


proc sort data=have;
  by patient;
run;

data want;
   set have;
   by patient;
   retain famnum 1000;
   if first.patient and missing(Family_id) then famnum+1;
   if missing(Family_id) then Family_id = put(famnum,best5. -L);
   drop famnum;
run;

An example of how to provide a data step of values.

The sort is to get all of the same patient records together, in case they do appear multiple times. I provided an example patient that appears twice.

The last data step uses the BY patient to increment the "family_id" when a patient is missing one and assign the new value.

 

Note that if you just happened to have an insurance policy number lying around that might be useable to group missing family id if they have the same policy information. Not a guarantee but one way to get somewhat better family membership for some.

AVA_16
Obsidian | Level 7
Thank you. Worked.

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 983 views
  • 1 like
  • 3 in conversation