- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Hello everyone,
My dataset is the following:
Person | Person_num | Year | Firstyear | Company |
A | 1 | 2005 | 1 | aa |
A | 1 | 2006 | 0 | aa |
A | 1 | 2007 | 0 | aa |
A | 1 | 2008 | 0 | aa |
A | 1 | 2008 | 0 | bb |
A | 1 | 2009 | 0 | aa |
A | 1 | 2009 | 0 | bb |
B | 2 | 2008 | 1 | cc |
B | 2 | 2009 | 0 | dd |
B | 2 | 2009 | 0 | cc |
B | 2 | 2010 | 0 | dd |
B | 2 | 2011 | 0 | dd |
B | 2 | 2012 | 0 | dd |
C | 3 | 2002 | 1 | ee |
C | 3 | 2002 | 1 | ff |
C | 3 | 2003 | 0 | ee |
C | 3 | 2003 | 0 | ff |
C | 3 | 2004 | 0 | ff |
C | 3 | 2008 | 0 | ff |
C | 3 | 2009 | 0 | ff |
C | 3 | 2010 | 0 | ff |
C | 3 | 2011 | 0 | ff |
C | 3 | 2012 | 0 | ff |
C | 3 | 2012 | 0 | gg |
C | 3 | 2013 | 0 | ff |
C | 3 | 2014 | 0 | ff |
D | 4 | 2005 | 1 | hh |
D | 4 | 2006 | 0 | hh |
D | 4 | 2010 | 0 | ii |
D | 4 | 2011 | 0 | ii |
D | 4 | 2012 | 0 | ii |
D | 4 | 2014 | 0 | jj |
D | 4 | 2015 | 0 | jj |
The expected outcome is the following:
Person | Person_num | Year | Firstyear | Company | Save | Want |
A | 1 | 2005 | 1 | aa | aa | 1 |
A | 1 | 2006 | 0 | aa | aa | 0 |
A | 1 | 2007 | 0 | aa | aa | 0 |
A | 1 | 2008 | 0 | aa | aa | 0 |
A | 1 | 2008 | 0 | bb | aa | 0 |
A | 1 | 2009 | 0 | aa | aa | 0 |
A | 1 | 2009 | 0 | bb | aa | 0 |
B | 2 | 2008 | 1 | cc | cc | 1 |
B | 2 | 2009 | 0 | dd | cc | 0 |
B | 2 | 2009 | 0 | cc | cc | 0 |
B | 2 | 2010 | 0 | dd | dd | 1 |
B | 2 | 2011 | 0 | dd | dd | 0 |
B | 2 | 2012 | 0 | dd | dd | 0 |
C | 3 | 2002 | 1 | ee | {ee, ff} | 1 |
C | 3 | 2002 | 1 | ff | {ee, ff} | 1 |
C | 3 | 2003 | 0 | ee | {ee, ff} | 0 |
C | 3 | 2003 | 0 | ff | {ee, ff} | 0 |
C | 3 | 2004 | 0 | ff | ff | 0 |
C | 3 | 2008 | 0 | ff | ff | 0 |
C | 3 | 2009 | 0 | ff | ff | 0 |
C | 3 | 2010 | 0 | ff | ff | 0 |
C | 3 | 2011 | 0 | ff | ff | 0 |
C | 3 | 2012 | 0 | ff | ff | 0 |
C | 3 | 2012 | 0 | gg | ff | 0 |
C | 3 | 2013 | 0 | ff | ff | 0 |
C | 3 | 2014 | 0 | ff | ff | 0 |
D | 4 | 2005 | 1 | hh | hh | 1 |
D | 4 | 2006 | 0 | hh | hh | 0 |
D | 4 | 2010 | 0 | ii | ii | 1 |
D | 4 | 2011 | 0 | ii | ii | 0 |
D | 4 | 2012 | 0 | ii | ii | 0 |
D | 4 | 2014 | 0 | jj | jj | 1 |
D | 4 | 2015 | 0 | jj | jj | 0 |
The logic behind is the following:
- If firstyear = 1 then want = 1;
- If firstyear = 1, save = {companies}. That is, save include all companies when firstyear = 1. (Is it possible to save multiple variables into the save variable?)
- If firstyear ne 0 and company = lag(save), then save = lag(save) and want = 0;
else if the company in save then save = lag(save) and want = 0;
else if, among the same year observations, there is an observation whose company is equal to the current observation's save variable, then save = lag(save) and want = 0;
else save = code and want = 1;
My attempt is the following:
data have;
input Person $ Person_num Year Firstyear Company $;
datalines;
A 1 2005 1 aa
A 1 2006 0 aa
A 1 2007 0 aa
A 1 2008 0 aa
A 1 2008 0 bb
A 1 2009 0 aa
A 1 2009 0 bb
B 2 2008 1 cc
B 2 2009 0 dd
B 2 2009 0 cc
B 2 2010 0 dd
B 2 2011 0 dd
B 2 2012 0 dd
C 3 2002 1 ee
C 3 2002 1 ff
C 3 2003 0 ee
C 3 2003 0 ff
C 3 2004 0 ff
C 3 2008 0 ff
C 3 2009 0 ff
C 3 2010 0 ff
C 3 2011 0 ff
C 3 2012 0 ff
C 3 2012 0 gg
C 3 2013 0 ff
C 3 2014 0 ff
D 4 2005 1 hh
D 4 2006 0 hh
D 4 2010 0 ii
D 4 2011 0 ii
D 4 2012 0 ii
D 4 2014 0 jj
D 4 2015 0 jj
;
data want; set have;
do i = 1 to 4;
if person_num = i then do;
if firstyear = 1 then want = 1 and save = code; *How can assign multiple variables into the save variable when Person = C;
if firstyear ne 0 and company = lag(save) then save = lag(save) and want = 0;
else if company in save then save = lag(save) and want = 0;
else if; *I have no idea in this part.;
else save = code and want = 1;
end;
end;
run;
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Please show us the desired output.
Paige Miller
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Want variable is the desired output.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
"If previous year observations within the same identifier include the code(P1), then want variable should be zero."
Why then does the 4th observation (A, P3, 2002) have want = 1 ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Because obs4's code is P3 (neither P1 nor P2), want variable has the value of 1.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
What is the logic for the last observation having want = 0?
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Q) What is the logic for the last observation having want = 0?
A) Since the last observation's code is P11 and the previous observations of the identifier D already have P11, the last want variable should be 0.
- Mark as New
- Bookmark
- Subscribe
- Mute
- RSS Feed
- Permalink
- Report Inappropriate Content
Want variable is not an identifier-level variable. It is an identifier-year-level variable