BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TrueTears
Obsidian | Level 7

Say I have a subsample of my data that looks as follows:

 

year event_year change id
2007 2009   3
2008 2009   3
2009 2009 1 3
2010 2009   3
2011 2009   3
2012 2009   3
2002 2004   55
2003 2004   55
2004 2004 0 55
2005 2004   55

 

Whenever the variable change takes a value of 1 for year = event_ye, I want to create a new column for that id only called event_year_new where it is equal to event_year+1. If change = 0, then nothing happens and event_year = event_year_new:

 

year event_year change id event_year_new
2007 2009   3 2010
2008 2009   3 2010
2009 2009 1 3 2010
2010 2009   3 2010
2011 2009   3 2010
2012 2009   3 2010
2002 2004   55 2004
2003 2004   55 2004
2004 2004 0 55 2004
2005 2004   55 2004

 

How could I do this? Of course the below code does not work because it only changes it for the case when year = event_year, but I want it to apply to each ID.

 


if year = event_year and change=1 then
      do;
event_year_new = event_ye+1;
      end;
1 ACCEPTED SOLUTION

Accepted Solutions
mkeintz
PROC Star

Assuming each id never has more than one change value, and that the data are sorted by id, then:

 

data have;
  input year	event_year	@17 change 1.	id;
datalines;
2007	2009	 	3
2008	2009	 	3
2009	2009	1	3
2010	2009	 	3
2011	2009	 	3
2012	2009	 	3
2002	2004	 	55
2003	2004	 	55
2004	2004	0	55
2005	2004	 	55
run;

data want;
  set have (where=(change^=.) in=inchange)
      have (in=inwanted);
  by id;

  if first.id then do;
    if change=1 then event_year_new=event_year+1; else
    if change=0 then event_year_new=event_year;  else
    event_year_new=.;
  end;
  retain event_year_new;
  if inwanted;
run;

For each id, this reads the change record prior to reading all the records.  So the event_year_new can be established at the start of each id.  Leaving the event_year_new variable retained propagates it through all the records for that id.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

View solution in original post

4 REPLIES 4
mkeintz
PROC Star

Question:  what if a given ID has more than one non-missing value for change?  Do you plan to make multiple new columns in such cases?

 

 

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
TrueTears
Obsidian | Level 7

No, the change variable only has values for when year = event_year. It is always missing for other cases.

mkeintz
PROC Star

Assuming each id never has more than one change value, and that the data are sorted by id, then:

 

data have;
  input year	event_year	@17 change 1.	id;
datalines;
2007	2009	 	3
2008	2009	 	3
2009	2009	1	3
2010	2009	 	3
2011	2009	 	3
2012	2009	 	3
2002	2004	 	55
2003	2004	 	55
2004	2004	0	55
2005	2004	 	55
run;

data want;
  set have (where=(change^=.) in=inchange)
      have (in=inwanted);
  by id;

  if first.id then do;
    if change=1 then event_year_new=event_year+1; else
    if change=0 then event_year_new=event_year;  else
    event_year_new=.;
  end;
  retain event_year_new;
  if inwanted;
run;

For each id, this reads the change record prior to reading all the records.  So the event_year_new can be established at the start of each id.  Leaving the event_year_new variable retained propagates it through all the records for that id.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Patrick
Opal | Level 21

Does below work for your actual data?

data have;
  infile datalines dsd truncover;
  input year event_year change id;
  datalines;
2007,2009, ,3
2008,2009, ,3
2009,2009,1,3
2010,2009, ,3
2011,2009, ,3
2012,2009, ,3
2002,2004, ,55
2003,2004, ,55
2004,2004,0,55
2005,2004, ,55
;

data want(drop=_:);
  merge 
    have 
    have( keep=id change
          where=(not missing(_change))
          rename=(change=_change)
        );
  by id;
  event_year_new=sum(event_year,_change);
run;

proc print data=want;
run;

Patrick_0-1595045852723.png

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 1352 views
  • 1 like
  • 3 in conversation