Hello everyone,
My dataset is the following:
Person | Person_num | Year | Firstyear | Company |
A | 1 | 2005 | 1 | aa |
A | 1 | 2006 | 0 | aa |
A | 1 | 2007 | 0 | aa |
A | 1 | 2008 | 0 | aa |
A | 1 | 2008 | 0 | bb |
A | 1 | 2009 | 0 | aa |
A | 1 | 2009 | 0 | bb |
B | 2 | 2008 | 1 | cc |
B | 2 | 2009 | 0 | dd |
B | 2 | 2009 | 0 | cc |
B | 2 | 2010 | 0 | dd |
B | 2 | 2011 | 0 | dd |
B | 2 | 2012 | 0 | dd |
C | 3 | 2002 | 1 | ee |
C | 3 | 2002 | 1 | ff |
C | 3 | 2003 | 0 | ee |
C | 3 | 2003 | 0 | ff |
C | 3 | 2004 | 0 | ff |
C | 3 | 2008 | 0 | ff |
C | 3 | 2009 | 0 | ff |
C | 3 | 2010 | 0 | ff |
C | 3 | 2011 | 0 | ff |
C | 3 | 2012 | 0 | ff |
C | 3 | 2012 | 0 | gg |
C | 3 | 2013 | 0 | ff |
C | 3 | 2014 | 0 | ff |
D | 4 | 2005 | 1 | hh |
D | 4 | 2006 | 0 | hh |
D | 4 | 2010 | 0 | ii |
D | 4 | 2011 | 0 | ii |
D | 4 | 2012 | 0 | ii |
D | 4 | 2014 | 0 | jj |
D | 4 | 2015 | 0 | jj |
The expected outcome is the following:
Person | Person_num | Year | Firstyear | Company | Save | Want |
A | 1 | 2005 | 1 | aa | aa | 1 |
A | 1 | 2006 | 0 | aa | aa | 0 |
A | 1 | 2007 | 0 | aa | aa | 0 |
A | 1 | 2008 | 0 | aa | aa | 0 |
A | 1 | 2008 | 0 | bb | aa | 0 |
A | 1 | 2009 | 0 | aa | aa | 0 |
A | 1 | 2009 | 0 | bb | aa | 0 |
B | 2 | 2008 | 1 | cc | cc | 1 |
B | 2 | 2009 | 0 | dd | cc | 0 |
B | 2 | 2009 | 0 | cc | cc | 0 |
B | 2 | 2010 | 0 | dd | dd | 1 |
B | 2 | 2011 | 0 | dd | dd | 0 |
B | 2 | 2012 | 0 | dd | dd | 0 |
C | 3 | 2002 | 1 | ee | {ee, ff} | 1 |
C | 3 | 2002 | 1 | ff | {ee, ff} | 1 |
C | 3 | 2003 | 0 | ee | {ee, ff} | 0 |
C | 3 | 2003 | 0 | ff | {ee, ff} | 0 |
C | 3 | 2004 | 0 | ff | ff | 0 |
C | 3 | 2008 | 0 | ff | ff | 0 |
C | 3 | 2009 | 0 | ff | ff | 0 |
C | 3 | 2010 | 0 | ff | ff | 0 |
C | 3 | 2011 | 0 | ff | ff | 0 |
C | 3 | 2012 | 0 | ff | ff | 0 |
C | 3 | 2012 | 0 | gg | ff | 0 |
C | 3 | 2013 | 0 | ff | ff | 0 |
C | 3 | 2014 | 0 | ff | ff | 0 |
D | 4 | 2005 | 1 | hh | hh | 1 |
D | 4 | 2006 | 0 | hh | hh | 0 |
D | 4 | 2010 | 0 | ii | ii | 1 |
D | 4 | 2011 | 0 | ii | ii | 0 |
D | 4 | 2012 | 0 | ii | ii | 0 |
D | 4 | 2014 | 0 | jj | jj | 1 |
D | 4 | 2015 | 0 | jj | jj | 0 |
The logic behind is the following:
else if the company in save then save = lag(save) and want = 0;
else if, among the same year observations, there is an observation whose company is equal to the current observation's save variable, then save = lag(save) and want = 0;
else save = code and want = 1;
My attempt is the following:
data have;
input Person $ Person_num Year Firstyear Company $;
datalines;
A 1 2005 1 aa
A 1 2006 0 aa
A 1 2007 0 aa
A 1 2008 0 aa
A 1 2008 0 bb
A 1 2009 0 aa
A 1 2009 0 bb
B 2 2008 1 cc
B 2 2009 0 dd
B 2 2009 0 cc
B 2 2010 0 dd
B 2 2011 0 dd
B 2 2012 0 dd
C 3 2002 1 ee
C 3 2002 1 ff
C 3 2003 0 ee
C 3 2003 0 ff
C 3 2004 0 ff
C 3 2008 0 ff
C 3 2009 0 ff
C 3 2010 0 ff
C 3 2011 0 ff
C 3 2012 0 ff
C 3 2012 0 gg
C 3 2013 0 ff
C 3 2014 0 ff
D 4 2005 1 hh
D 4 2006 0 hh
D 4 2010 0 ii
D 4 2011 0 ii
D 4 2012 0 ii
D 4 2014 0 jj
D 4 2015 0 jj
;
data want; set have;
do i = 1 to 4;
if person_num = i then do;
if firstyear = 1 then want = 1 and save = code; *How can assign multiple variables into the save variable when Person = C;
if firstyear ne 0 and company = lag(save) then save = lag(save) and want = 0;
else if company in save then save = lag(save) and want = 0;
else if; *I have no idea in this part.;
else save = code and want = 1;
end;
end;
run;
Please show us the desired output.
Want variable is the desired output.
"If previous year observations within the same identifier include the code(P1), then want variable should be zero."
Why then does the 4th observation (A, P3, 2002) have want = 1 ?
Because obs4's code is P3 (neither P1 nor P2), want variable has the value of 1.
What is the logic for the last observation having want = 0?
Q) What is the logic for the last observation having want = 0?
A) Since the last observation's code is P11 and the previous observations of the identifier D already have P11, the last want variable should be 0.
Want variable is not an identifier-level variable. It is an identifier-year-level variable
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9.
Save $200 when you sign up by March 14!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.