BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
chimei0403
Obsidian | Level 7

I need some guidance.

I would like to fill in missing year and create a dummy=0 by id.

DATA HAVE;
	INPUT  YEAR ID $ NUMBER;
    CARDS;
 2017 a   1
 2018 a   1
 2020 a   1
 2016 b   1
 2020 b   1
 2018 c   1
 2020 c   1
;
RUN;

I wish my final data look like below. Many thanks for the advice in advance!

 

2017    a    1
2018    a    1
2019    a    0
2020    a    1
2016    b    1
2017    b    0
2018    b    0
2019    b    0
2020    b    1
2018    c    1
2019    c    0
2020    c    1

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

One way that works with the provided example:

DATA HAVE;
	INPUT  YEAR ID $ NUMBER;
    CARDS;
 2017 a   1
 2018 a   1
 2020 a   1
 2016 b   1
 2020 b   1
 2018 c   1
 2020 c   1
;
RUN;

data want;
   set have;
   lid = lag(id);
   lyr = lag(year);
   if lid=id and ((year-lyr)>1) then do;
   /* store current year and number in temp variables*/
      tyr=year;
      tnum=number;
      do year=(lyr+1) to (tyr-1);
         number=0;
         output;
      end;
      /*restore from temp and write to output*/
      year=tyr;
      number=tnum;
      output;
   end;
   /* write to output when not a gap*/
   else output;
   keep year id number;
run;

If you have not seen the LAG function it is a way to get the value of a variable from previous records (cautions apply).

The loop goes through the "missing" year values.

The "trick" is storing the values so you can set the desired values and control when the data is written to the output data which is done with the OUTPUT instruction.


@chimei0403 wrote:

I need some guidance.

I would like to fill in missing year and create a dummy=0 by id.

DATA HAVE;
	INPUT  YEAR ID $ NUMBER;
    CARDS;
 2017 a   1
 2018 a   1
 2020 a   1
 2016 b   1
 2020 b   1
 2018 c   1
 2020 c   1
;
RUN;

I wish my final data look like below. Many thanks for the advice in advance!

 

2017    a    1
2018    a    1
2019    a    0
2020    a    1
2016    b    1
2017    b    0
2018    b    0
2019    b    0
2020    b    1
2018    c    1
2019    c    0
2020    c    1


 

View solution in original post

8 REPLIES 8
ballardw
Super User

One way that works with the provided example:

DATA HAVE;
	INPUT  YEAR ID $ NUMBER;
    CARDS;
 2017 a   1
 2018 a   1
 2020 a   1
 2016 b   1
 2020 b   1
 2018 c   1
 2020 c   1
;
RUN;

data want;
   set have;
   lid = lag(id);
   lyr = lag(year);
   if lid=id and ((year-lyr)>1) then do;
   /* store current year and number in temp variables*/
      tyr=year;
      tnum=number;
      do year=(lyr+1) to (tyr-1);
         number=0;
         output;
      end;
      /*restore from temp and write to output*/
      year=tyr;
      number=tnum;
      output;
   end;
   /* write to output when not a gap*/
   else output;
   keep year id number;
run;

If you have not seen the LAG function it is a way to get the value of a variable from previous records (cautions apply).

The loop goes through the "missing" year values.

The "trick" is storing the values so you can set the desired values and control when the data is written to the output data which is done with the OUTPUT instruction.


@chimei0403 wrote:

I need some guidance.

I would like to fill in missing year and create a dummy=0 by id.

DATA HAVE;
	INPUT  YEAR ID $ NUMBER;
    CARDS;
 2017 a   1
 2018 a   1
 2020 a   1
 2016 b   1
 2020 b   1
 2018 c   1
 2020 c   1
;
RUN;

I wish my final data look like below. Many thanks for the advice in advance!

 

2017    a    1
2018    a    1
2019    a    0
2020    a    1
2016    b    1
2017    b    0
2018    b    0
2019    b    0
2020    b    1
2018    c    1
2019    c    0
2020    c    1


 

chimei0403
Obsidian | Level 7
Thank you so much for the reply promptly! It works!!
chimei0403
Obsidian | Level 7

@ballardw @Ksharp @mkeintz 

Sorry to bother you all again. I think I miss one critical criteria. I try and modify codes but can't figure it out.  If the id only had  1 number record in a given year, the rest of the year of number variable would equal zero. Something like below. Wish to hear the advice. Many thanks! 

DATA HAVE;
	INPUT  YEAR ID $ NUMBER;
    CARDS;
 2017 a   1
 2018 a   1
 2020 a   1
 2016 b   1
 2020 b   1
 2018 c   1
 2020 c   1
2016 d 1
2018 e 1 ; RUN;

 

 

Wish to fix the want dataset as below: 

2017    a    1
2018    a    1
2019    a    0
2020    a    1
2016    b    1
2017    b    0
2018    b    0
2019    b    0
2020    b    1
2018    c    1
2019    c    0
2020    c    1

2016   d     1

2017   d     0

2018   d     0

2019   d     0

2020   d     0

2018   e     1

2019   e     0

2020   e     0

 

mkeintz
PROC Star

So in addition to filling interior holes with the intervening years and number=0, you also want to similarly fill "trailing" holes through year=2020.  Now I will revert to recommending the self-merge with lookahead technique:

 

DATA HAVE;
	INPUT  YEAR ID $ NUMBER;
    CARDS;
 2017 a   1
 2018 a   1
 2020 a   1
 2016 b   1
 2020 b   1
 2018 c   1
 2020 c   1
 2016 d   1
 2018 e   1
RUN;
data want (drop=nxt_:);
  merge have
        have (firstobs=2 keep=year id rename=(year=nxt_yr id=nxt_id));
  if nxt_id^=id then nxt_yr=2021;
  do year=year to nxt_yr-1;
    output;
    number=0;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
DATA HAVE;
 INPUT  YEAR ID $ NUMBER;
    CARDS;
 2017 a   1
 2018 a   1
 2020 a   1
 2016 b   1
 2020 b   1
 2018 c   1
 2020 c   1
 2016 d   1
 2018 e   1
;
RUN;

proc sql noprint;
create table temp as
select id,min(year) as min
 from have
  group by id;

select max(year) into :max from have;
quit;

data temp2;
 set temp;
 do year=min to &max.;
  output;
 end;
 keep id year;
run;

proc sql;
create table want as
select a.*,coalesce(b.number,0) as number 
 from temp2 as a left join have as b
  on a.id=b.id and a.year=b.year;
quit;
mkeintz
PROC Star

Ordinarily I'd recommend a self-merge with lookahead.  But this pair of SET statements (one of which does a lookahead) works just as well:

 

dATA HAVE;
	INPUT  YEAR ID $ NUMBER;
    CARDS;
 2017 a   1
 2018 a   1
 2020 a   1
 2016 b   1
 2020 b   1
 2018 c   1
 2020 c   1
RUN;


data want (drop=nxt_:);
  set have;
  by id;
  if end_of_nxt=0 then set have (firstobs=2 keep=year rename=(year=nxt_year)) end=end_of_nxt;
  output;
  if last.id=0;   /* Added after oversight noticed*/
  number=0;
  do while (year<nxt_year-1);
    year=year+1;
    output;
  end;
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
chimei0403
Obsidian | Level 7
Thank you so much! This approach also works! You guy are amazing!! Much appreciate it!!
Ksharp
Super User

If you have SAS/ETS module.

 

DATA HAVE;
 INPUT  YEAR ID $ NUMBER;
    CARDS;
 2017 a   1
 2018 a   1
 2020 a   1
 2016 b   1
 2020 b   1
 2018 c   1
 2020 c   1
;
RUN;

proc timeseries data=have out=want ;
by id;
id year interval=day format=best.;
var NUMBER / setmissing=0;
run;

proc print data=want;run;

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 616 views
  • 2 likes
  • 4 in conversation