Hello SAS members,
I have a dataset with observations and variables like these:
id day workout
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
2 2 1
2 4 1
2 5 1
3 1 1
3 2 1
3 3 1
So person 1 records all day (1 to 5) as the person work out every day, but person 2 and 3 don't have workout records of day 1, 3 and day 4, 5. Since no records mean no workout, I want to add in observations of 0s for those people (person 2 and 3). How can I do that without manually entering data or 0s to each observation? I have more than 5,000 people in the data.
Thanks!
First, create a shell of all id/day values:
proc sql;
create table days as select distinct day from have;
create table ids as select distinct id from have;
create table shell as select * from days, ids order by id day;
quit;
Then merge them with the actual data. Assuming the actual data is sorted by ID DAY:
data want;
merge shell have (in=have_data);
by id day;
if have_data=0 then workout=0;
run;
First, create a shell of all id/day values:
proc sql;
create table days as select distinct day from have;
create table ids as select distinct id from have;
create table shell as select * from days, ids order by id day;
quit;
Then merge them with the actual data. Assuming the actual data is sorted by ID DAY:
data want;
merge shell have (in=have_data);
by id day;
if have_data=0 then workout=0;
run;
Another neat little trick you can do is to use the SPARSE option in PROC FREQ.
As long as there's a value somewhere it will be filled in.
proc freq data=have noprint;
table id * day / out = want sparse;
weight workout;
run;
@mkt_apprentice wrote:
Hello SAS members,
I have a dataset with observations and variables like these:
id day workout
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
2 2 1
2 4 1
2 5 1
3 1 1
3 2 1
3 3 1
So person 1 records all day (1 to 5) as the person work out every day, but person 2 and 3 don't have workout records of day 1, 3 and day 4, 5. Since no records mean no workout, I want to add in observations of 0s for those people (person 2 and 3). How can I do that without manually entering data or 0s to each observation? I have more than 5,000 people in the data.
Thanks!
Pretty straight forward once you know the max days
data have;
input id day workout ;
cards;
1 1 1
1 2 1
1 3 1
1 4 1
1 5 1
2 2 1
2 4 1
2 5 1
3 1 1
3 2 1
3 3 1
;
proc sql;
select max(day) into :d
from have;
quit;
data want;
array t(&d)_temporary_ (1:&d) ;
array j(&d) _temporary_;
call missing(of j(*));
do until(last.id);
set have;
by id;
j(day)=day;
end;
do _n_=1 to dim(t);
day=t(_n_);
workout=t(_n_)=j(_n_);
output;
end;
run;
using day as index instead of _n_ in the loop although no difference really as such
data want;
array t(&d)_temporary_ (1:&d) ;
array j(&d) _temporary_;
call missing(of j(*));
do until(last.id);
set have;
by id;
j(day)=day;
end;
do day=1 to dim(t);
workout=t(day)=j(day);
output;
end;
run;
Nearly 200 sessions are now available on demand with the SAS Innovate Digital Pass.
Explore Now →SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.