I need some guidance.
I would like to fill in missing year and create a dummy=0 by id.
DATA HAVE; INPUT YEAR ID $ NUMBER; CARDS; 2017 a 1 2018 a 1 2020 a 1 2016 b 1 2020 b 1 2018 c 1 2020 c 1 ; RUN;
I wish my final data look like below. Many thanks for the advice in advance!
2017 a 1
2018 a 1
2019 a 0
2020 a 1
2016 b 1
2017 b 0
2018 b 0
2019 b 0
2020 b 1
2018 c 1
2019 c 0
2020 c 1
One way that works with the provided example:
DATA HAVE; INPUT YEAR ID $ NUMBER; CARDS; 2017 a 1 2018 a 1 2020 a 1 2016 b 1 2020 b 1 2018 c 1 2020 c 1 ; RUN; data want; set have; lid = lag(id); lyr = lag(year); if lid=id and ((year-lyr)>1) then do; /* store current year and number in temp variables*/ tyr=year; tnum=number; do year=(lyr+1) to (tyr-1); number=0; output; end; /*restore from temp and write to output*/ year=tyr; number=tnum; output; end; /* write to output when not a gap*/ else output; keep year id number; run;
If you have not seen the LAG function it is a way to get the value of a variable from previous records (cautions apply).
The loop goes through the "missing" year values.
The "trick" is storing the values so you can set the desired values and control when the data is written to the output data which is done with the OUTPUT instruction.
@chimei0403 wrote:
I need some guidance.
I would like to fill in missing year and create a dummy=0 by id.
DATA HAVE; INPUT YEAR ID $ NUMBER; CARDS; 2017 a 1 2018 a 1 2020 a 1 2016 b 1 2020 b 1 2018 c 1 2020 c 1 ; RUN;I wish my final data look like below. Many thanks for the advice in advance!
2017 a 1
2018 a 1
2019 a 0
2020 a 1
2016 b 1
2017 b 0
2018 b 0
2019 b 0
2020 b 1
2018 c 1
2019 c 0
2020 c 1
One way that works with the provided example:
DATA HAVE; INPUT YEAR ID $ NUMBER; CARDS; 2017 a 1 2018 a 1 2020 a 1 2016 b 1 2020 b 1 2018 c 1 2020 c 1 ; RUN; data want; set have; lid = lag(id); lyr = lag(year); if lid=id and ((year-lyr)>1) then do; /* store current year and number in temp variables*/ tyr=year; tnum=number; do year=(lyr+1) to (tyr-1); number=0; output; end; /*restore from temp and write to output*/ year=tyr; number=tnum; output; end; /* write to output when not a gap*/ else output; keep year id number; run;
If you have not seen the LAG function it is a way to get the value of a variable from previous records (cautions apply).
The loop goes through the "missing" year values.
The "trick" is storing the values so you can set the desired values and control when the data is written to the output data which is done with the OUTPUT instruction.
@chimei0403 wrote:
I need some guidance.
I would like to fill in missing year and create a dummy=0 by id.
DATA HAVE; INPUT YEAR ID $ NUMBER; CARDS; 2017 a 1 2018 a 1 2020 a 1 2016 b 1 2020 b 1 2018 c 1 2020 c 1 ; RUN;I wish my final data look like below. Many thanks for the advice in advance!
2017 a 1
2018 a 1
2019 a 0
2020 a 1
2016 b 1
2017 b 0
2018 b 0
2019 b 0
2020 b 1
2018 c 1
2019 c 0
2020 c 1
Sorry to bother you all again. I think I miss one critical criteria. I try and modify codes but can't figure it out. If the id only had 1 number record in a given year, the rest of the year of number variable would equal zero. Something like below. Wish to hear the advice. Many thanks!
DATA HAVE; INPUT YEAR ID $ NUMBER; CARDS; 2017 a 1 2018 a 1 2020 a 1 2016 b 1 2020 b 1 2018 c 1 2020 c 1
2016 d 1
2018 e 1 ; RUN;
Wish to fix the want dataset as below:
2017 a 1
2018 a 1
2019 a 0
2020 a 1
2016 b 1
2017 b 0
2018 b 0
2019 b 0
2020 b 1
2018 c 1
2019 c 0
2020 c 1
2016 d 1
2017 d 0
2018 d 0
2019 d 0
2020 d 0
2018 e 1
2019 e 0
2020 e 0
So in addition to filling interior holes with the intervening years and number=0, you also want to similarly fill "trailing" holes through year=2020. Now I will revert to recommending the self-merge with lookahead technique:
DATA HAVE;
INPUT YEAR ID $ NUMBER;
CARDS;
2017 a 1
2018 a 1
2020 a 1
2016 b 1
2020 b 1
2018 c 1
2020 c 1
2016 d 1
2018 e 1
RUN;
data want (drop=nxt_:);
merge have
have (firstobs=2 keep=year id rename=(year=nxt_yr id=nxt_id));
if nxt_id^=id then nxt_yr=2021;
do year=year to nxt_yr-1;
output;
number=0;
end;
run;
DATA HAVE;
INPUT YEAR ID $ NUMBER;
CARDS;
2017 a 1
2018 a 1
2020 a 1
2016 b 1
2020 b 1
2018 c 1
2020 c 1
2016 d 1
2018 e 1
;
RUN;
proc sql noprint;
create table temp as
select id,min(year) as min
from have
group by id;
select max(year) into :max from have;
quit;
data temp2;
set temp;
do year=min to &max.;
output;
end;
keep id year;
run;
proc sql;
create table want as
select a.*,coalesce(b.number,0) as number
from temp2 as a left join have as b
on a.id=b.id and a.year=b.year;
quit;
Ordinarily I'd recommend a self-merge with lookahead. But this pair of SET statements (one of which does a lookahead) works just as well:
dATA HAVE;
INPUT YEAR ID $ NUMBER;
CARDS;
2017 a 1
2018 a 1
2020 a 1
2016 b 1
2020 b 1
2018 c 1
2020 c 1
RUN;
data want (drop=nxt_:);
set have;
by id;
if end_of_nxt=0 then set have (firstobs=2 keep=year rename=(year=nxt_year)) end=end_of_nxt;
output;
if last.id=0; /* Added after oversight noticed*/
number=0;
do while (year<nxt_year-1);
year=year+1;
output;
end;
run;
If you have SAS/ETS module.
DATA HAVE; INPUT YEAR ID $ NUMBER; CARDS; 2017 a 1 2018 a 1 2020 a 1 2016 b 1 2020 b 1 2018 c 1 2020 c 1 ; RUN; proc timeseries data=have out=want ; by id; id year interval=day format=best.; var NUMBER / setmissing=0; run; proc print data=want;run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.