Hello SAS members,
I have a dataset with observations and variables like these:
id day workout
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
2 2 1
2 4 1
2 5 1
3 1 1
3 2 1
3 3 1
So person 1 records all day (1 to 5) as the person work out every day, but person 2 and 3 don't have workout records of day 1, 3 and day 4, 5. Since no records mean no workout, I want to add in observations of 0s for those people (person 2 and 3). How can I do that without manually entering data or 0s to each observation? I have more than 5,000 people in the data.
Thanks!
First, create a shell of all id/day values:
proc sql;
create table days as select distinct day from have;
create table ids as select distinct id from have;
create table shell as select * from days, ids order by id day;
quit;
Then merge them with the actual data. Assuming the actual data is sorted by ID DAY:
data want;
merge shell have (in=have_data);
by id day;
if have_data=0 then workout=0;
run;
First, create a shell of all id/day values:
proc sql;
create table days as select distinct day from have;
create table ids as select distinct id from have;
create table shell as select * from days, ids order by id day;
quit;
Then merge them with the actual data. Assuming the actual data is sorted by ID DAY:
data want;
merge shell have (in=have_data);
by id day;
if have_data=0 then workout=0;
run;
Another neat little trick you can do is to use the SPARSE option in PROC FREQ.
As long as there's a value somewhere it will be filled in.
proc freq data=have noprint;
table id * day / out = want sparse;
weight workout;
run;
@mkt_apprentice wrote:
Hello SAS members,
I have a dataset with observations and variables like these:
id day workout
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
2 2 1
2 4 1
2 5 1
3 1 1
3 2 1
3 3 1
So person 1 records all day (1 to 5) as the person work out every day, but person 2 and 3 don't have workout records of day 1, 3 and day 4, 5. Since no records mean no workout, I want to add in observations of 0s for those people (person 2 and 3). How can I do that without manually entering data or 0s to each observation? I have more than 5,000 people in the data.
Thanks!
Pretty straight forward once you know the max days
data have;
input id day workout ;
cards;
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
2 2 1
2 4 1
2 5 1
3 1 1
3 2 1
3 3 1
;
proc sql;
select max(day) into :d
from have;
quit;
data want;
array t(&d)_temporary_ (1:&d) ;
array j(&d) _temporary_;
call missing(of j(*));
do until(last.id);
set have;
by id;
j(day)=day;
end;
do _n_=1 to dim(t);
day=t(_n_);
workout=t(_n_)=j(_n_);
output;
end;
run;
using day as index instead of _n_ in the loop although no difference really as such
data want;
array t(&d)_temporary_ (1:&d) ;
array j(&d) _temporary_;
call missing(of j(*));
do until(last.id);
set have;
by id;
j(day)=day;
end;
do day=1 to dim(t);
workout=t(day)=j(day);
output;
end;
run;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.