BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TrueTears
Obsidian | Level 7

Say I have a subsample of my data that looks as follows:

 

year event_year change id
2007 2009   3
2008 2009   3
2009 2009 1 3
2010 2009   3
2011 2009   3
2012 2009   3
2002 2004   55
2003 2004   55
2004 2004 0 55
2005 2004   55

 

Whenever the variable change takes a value of 1 for year = event_ye, I want to create a new column for that id only called event_year_new where it is equal to event_year+1. If change = 0, then nothing happens and event_year = event_year_new:

 

year event_year change id event_year_new
2007 2009   3 2010
2008 2009   3 2010
2009 2009 1 3 2010
2010 2009   3 2010
2011 2009   3 2010
2012 2009   3 2010
2002 2004   55 2004
2003 2004   55 2004
2004 2004 0 55 2004
2005 2004   55 2004

 

How could I do this? Of course the below code does not work because it only changes it for the case when year = event_year, but I want it to apply to each ID.

 


if year = event_year and change=1 then
      do;
event_year_new = event_ye+1;
      end;
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Assuming each id never has more than one change value, and that the data are sorted by id, then:

 

data have;
  input year	event_year	@17 change 1.	id;
datalines;
2007	2009	 	3
2008	2009	 	3
2009	2009	1	3
2010	2009	 	3
2011	2009	 	3
2012	2009	 	3
2002	2004	 	55
2003	2004	 	55
2004	2004	0	55
2005	2004	 	55
run;

data want;
  set have (where=(change^=.) in=inchange)
      have (in=inwanted);
  by id;

  if first.id then do;
    if change=1 then event_year_new=event_year+1; else
    if change=0 then event_year_new=event_year;  else
    event_year_new=.;
  end;
  retain event_year_new;
  if inwanted;
run;

For each id, this reads the change record prior to reading all the records.  So the event_year_new can be established at the start of each id.  Leaving the event_year_new variable retained propagates it through all the records for that id.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

Question:  what if a given ID has more than one non-missing value for change?  Do you plan to make multiple new columns in such cases?

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
TrueTears
Obsidian | Level 7

No, the change variable only has values for when year = event_year. It is always missing for other cases.

mkeintz
PROC Star

Assuming each id never has more than one change value, and that the data are sorted by id, then:

 

data have;
  input year	event_year	@17 change 1.	id;
datalines;
2007	2009	 	3
2008	2009	 	3
2009	2009	1	3
2010	2009	 	3
2011	2009	 	3
2012	2009	 	3
2002	2004	 	55
2003	2004	 	55
2004	2004	0	55
2005	2004	 	55
run;

data want;
  set have (where=(change^=.) in=inchange)
      have (in=inwanted);
  by id;

  if first.id then do;
    if change=1 then event_year_new=event_year+1; else
    if change=0 then event_year_new=event_year;  else
    event_year_new=.;
  end;
  retain event_year_new;
  if inwanted;
run;

For each id, this reads the change record prior to reading all the records.  So the event_year_new can be established at the start of each id.  Leaving the event_year_new variable retained propagates it through all the records for that id.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

Does below work for your actual data?

data have;
  infile datalines dsd truncover;
  input year event_year change id;
  datalines;
2007,2009, ,3
2008,2009, ,3
2009,2009,1,3
2010,2009, ,3
2011,2009, ,3
2012,2009, ,3
2002,2004, ,55
2003,2004, ,55
2004,2004,0,55
2005,2004, ,55
;

data want(drop=_:);
  merge 
    have 
    have( keep=id change
          where=(not missing(_change))
          rename=(change=_change)
        );
  by id;
  event_year_new=sum(event_year,_change);
run;

proc print data=want;
run;

Patrick_0-1595045852723.png

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1442 views
  • 1 like
  • 3 in conversation