Hello,
I have a data set for apartments with these variables:
Building_id: ID for the building that the apartment is located in.
Apartment_id: ID for the apartment itself.
I would like to generate a variable, Group_id, that groups the apartments together in groups. These groups have to include a minimum of 12 apartments and these apartments need the same building_id.
So, if there are 23 apartments with the same building_id they would simply be in one group, because there are not enough apartments for 2 groups of 12. If there are 24 apartments with the same building_id, they would be in two groups of 12. If there are 25 apartments with the same building_id, they would be in two groups of 12 and 13.
Another example: if there are 100 apartments in the same building (with the same building_id), I want the first 12 to be grouped with one Group_id, 13-24 to be grouped with another Group_id, 25-36 with a third Group_id,..., and then the last apartments from 84-100 to be grouped together with the same Group_id. The last group includes 16 apartments (larger than 12), but if we grouped 84-96 together, then the last 4 apartments would be too few to be in a group, because of the 12 minimum rule.
My data set include a lot of apartments with a lot of different building-id's, so I am hoping for a code that does this grouping automatically.
I hope this makes sense - please ask away if it doesn't. Hoping someone can help :-).
With computers, one should always start counting at zero 😉
data want;
count = 0;
do until (last.building_id);
set have;
by building_id;
count + 1;
end;
group_id = 0;
count2 = 0;
do until (last.building_id);
set have;
by building_id;
if mod(count2,12) = 0 and count - count2 ge 12 then group_id + 1;
count2 + 1;
output;
end;
drop count count2;
run;
See this (with simple numeric ID's):
data have;
building_id = 1;
do appartment_id = 1 to 23;
output;
end;
building_id = 2;
do appartment_id = 1 to 24;
output;
end;
building_id = 3;
do appartment_id = 1 to 25;
output;
end;
run;
data want;
count = 0;
do until (last.building_id);
set have;
by building_id;
count + 1;
end;
group_id = 1;
count2 = 0;
do until (last.building_id);
set have;
by building_id;
count2 + 1;
if mod(count2,12) = 0 and count - count2 ge 12 then group_id + 1;
output;
end;
drop count count2;
run;
With computers, one should always start counting at zero 😉
data want;
count = 0;
do until (last.building_id);
set have;
by building_id;
count + 1;
end;
group_id = 0;
count2 = 0;
do until (last.building_id);
set have;
by building_id;
if mod(count2,12) = 0 and count - count2 ge 12 then group_id + 1;
count2 + 1;
output;
end;
drop count count2;
run;
Something like this perhaps?
data want;
do count=1 by 1 until(last.building_ID);
set have(keep=building_ID);
by Building_ID;
end;
group_ID+1;
do _N_=1 by 1 until(last.building_ID);
set have;
by building_ID;
output;
if mod(_N_,12)=0 and count-_N_>12 then
group_ID+1;
end;
run;
The first DO loop just counts the number of apartments in each building, the second reads the data again and outputs, and for each 12 observations it adds to to the group_ID if there are enough apartments left.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.