Retain multiple variables and use subset within do loop
Hello everyone,
My dataset is the following:
Person | Person_num | Year | Firstyear | Company |
A | 1 | 2005 | 1 | aa |
A | 1 | 2006 | 0 | aa |
A | 1 | 2007 | 0 | aa |
A | 1 | 2008 | 0 | aa |
A | 1 | 2008 | 0 | bb |
A | 1 | 2009 | 0 | aa |
A | 1 | 2009 | 0 | bb |
B | 2 | 2008 | 1 | cc |
B | 2 | 2009 | 0 | dd |
B | 2 | 2009 | 0 | cc |
B | 2 | 2010 | 0 | dd |
B | 2 | 2011 | 0 | dd |
B | 2 | 2012 | 0 | dd |
C | 3 | 2002 | 1 | ee |
C | 3 | 2002 | 1 | ff |
C | 3 | 2003 | 0 | ee |
C | 3 | 2003 | 0 | ff |
C | 3 | 2004 | 0 | ff |
C | 3 | 2008 | 0 | ff |
C | 3 | 2009 | 0 | ff |
C | 3 | 2010 | 0 | ff |
C | 3 | 2011 | 0 | ff |
C | 3 | 2012 | 0 | ff |
C | 3 | 2012 | 0 | gg |
C | 3 | 2013 | 0 | ff |
C | 3 | 2014 | 0 | ff |
D | 4 | 2005 | 1 | hh |
D | 4 | 2006 | 0 | hh |
D | 4 | 2010 | 0 | ii |
D | 4 | 2011 | 0 | ii |
D | 4 | 2012 | 0 | ii |
D | 4 | 2014 | 0 | jj |
D | 4 | 2015 | 0 | jj |
The expected outcome is the following:
Person | Person_num | Year | Firstyear | Company | Save | Want |
A | 1 | 2005 | 1 | aa | aa | 1 |
A | 1 | 2006 | 0 | aa | aa | 0 |
A | 1 | 2007 | 0 | aa | aa | 0 |
A | 1 | 2008 | 0 | aa | aa | 0 |
A | 1 | 2008 | 0 | bb | aa | 0 |
A | 1 | 2009 | 0 | aa | aa | 0 |
A | 1 | 2009 | 0 | bb | aa | 0 |
B | 2 | 2008 | 1 | cc | cc | 1 |
B | 2 | 2009 | 0 | dd | cc | 0 |
B | 2 | 2009 | 0 | cc | cc | 0 |
B | 2 | 2010 | 0 | dd | dd | 1 |
B | 2 | 2011 | 0 | dd | dd | 0 |
B | 2 | 2012 | 0 | dd | dd | 0 |
C | 3 | 2002 | 1 | ee | {ee, ff} | 1 |
C | 3 | 2002 | 1 | ff | {ee, ff} | 1 |
C | 3 | 2003 | 0 | ee | {ee, ff} | 0 |
C | 3 | 2003 | 0 | ff | {ee, ff} | 0 |
C | 3 | 2004 | 0 | ff | ff | 0 |
C | 3 | 2008 | 0 | ff | ff | 0 |
C | 3 | 2009 | 0 | ff | ff | 0 |
C | 3 | 2010 | 0 | ff | ff | 0 |
C | 3 | 2011 | 0 | ff | ff | 0 |
C | 3 | 2012 | 0 | ff | ff | 0 |
C | 3 | 2012 | 0 | gg | ff | 0 |
C | 3 | 2013 | 0 | ff | ff | 0 |
C | 3 | 2014 | 0 | ff | ff | 0 |
D | 4 | 2005 | 1 | hh | hh | 1 |
D | 4 | 2006 | 0 | hh | hh | 0 |
D | 4 | 2010 | 0 | ii | ii | 1 |
D | 4 | 2011 | 0 | ii | ii | 0 |
D | 4 | 2012 | 0 | ii | ii | 0 |
D | 4 | 2014 | 0 | jj | jj | 1 |
D | 4 | 2015 | 0 | jj | jj | 0 |
The logic behind is the following:
else if the company in save then save = lag(save) and want = 0;
else if, among the same year observations, there is an observation whose company is equal to the current observation's save variable, then save = lag(save) and want = 0;
else save = company and want = 1;
My attempt is the following:
data have;
input Person $ Person_num Year Firstyear Company $;
datalines;
A 1 2005 1 aa
A 1 2006 0 aa
A 1 2007 0 aa
A 1 2008 0 aa
A 1 2008 0 bb
A 1 2009 0 aa
A 1 2009 0 bb
B 2 2008 1 cc
B 2 2009 0 dd
B 2 2009 0 cc
B 2 2010 0 dd
B 2 2011 0 dd
B 2 2012 0 dd
C 3 2002 1 ee
C 3 2002 1 ff
C 3 2003 0 ee
C 3 2003 0 ff
C 3 2004 0 ff
C 3 2008 0 ff
C 3 2009 0 ff
C 3 2010 0 ff
C 3 2011 0 ff
C 3 2012 0 ff
C 3 2012 0 gg
C 3 2013 0 ff
C 3 2014 0 ff
D 4 2005 1 hh
D 4 2006 0 hh
D 4 2010 0 ii
D 4 2011 0 ii
D 4 2012 0 ii
D 4 2014 0 jj
D 4 2015 0 jj
;
data want; set have;
do i = 1 to 4;
if person_num = i then do;
if firstyear = 1 then want = 1 and save = company; *How can assign multiple variables into the save variable when Person = C;
if firstyear ne 0 and company = lag(save) then save = lag(save) and want = 0;
else if company in save then save = lag(save) and want = 0;
else if; *I have no idea in this part.;
else save = company and want = 1;
end;
end;
run;
Is there a way to reflect my logic in SAS?
If you know the way, please let me know.
Thank you for your help.
Thank you for providing a full program.
Your needs are unclear:
1. What's the CODE variable?
2. Please explain how you derive the last 2 WANT with value=1.
3. Please ensure that you provide test data and results covering all the test cases you mention.
Hello everyone,
My dataset is the following:
Person
Person_num
Year
Firstyear
Company
A
1
2005
1
aa
A
1
2006
0
aa
A
1
2007
0
aa
A
1
2008
0
aa
A
1
2008
0
bb
A
1
2009
0
aa
A
1
2009
0
bb
B
2
2008
1
cc
B
2
2009
0
dd
B
2
2009
0
cc
B
2
2010
0
dd
B
2
2011
0
dd
B
2
2012
0
dd
C
3
2002
1
ee
C
3
2002
1
ff
C
3
2003
0
ee
C
3
2003
0
ff
C
3
2004
0
ff
C
3
2008
0
ff
C
3
2009
0
ff
C
3
2010
0
ff
C
3
2011
0
ff
C
3
2012
0
ff
C
3
2012
0
gg
C
3
2013
0
ff
C
3
2014
0
ff
D
4
2005
1
hh
D
4
2006
0
hh
D
4
2010
0
ii
D
4
2011
0
ii
D
4
2012
0
ii
D
4
2014
0
jj
D
4
2015
0
jj
The expected outcome is the following:
Person
Person_num
Year
Firstyear
Company
Save
Want
A
1
2005
1
aa
aa
1
A
1
2006
0
aa
aa
0
A
1
2007
0
aa
aa
0
A
1
2008
0
aa
aa
0
A
1
2008
0
bb
aa
0
A
1
2009
0
aa
aa
0
A
1
2009
0
bb
aa
0
B
2
2008
1
cc
cc
1
B
2
2009
0
dd
cc
0
B
2
2009
0
cc
cc
0
B
2
2010
0
dd
dd
1
B
2
2011
0
dd
dd
0
B
2
2012
0
dd
dd
0
C
3
2002
1
ee
{ee, ff}
1
C
3
2002
1
ff
{ee, ff}
1
C
3
2003
0
ee
{ee, ff}
0
C
3
2003
0
ff
{ee, ff}
0
C
3
2004
0
ff
ff
0
C
3
2008
0
ff
ff
0
C
3
2009
0
ff
ff
0
C
3
2010
0
ff
ff
0
C
3
2011
0
ff
ff
0
C
3
2012
0
ff
ff
0
C
3
2012
0
gg
ff
0
C
3
2013
0
ff
ff
0
C
3
2014
0
ff
ff
0
D
4
2005
1
hh
hh
1
D
4
2006
0
hh
hh
0
D
4
2010
0
ii
ii
1
D
4
2011
0
ii
ii
0
D
4
2012
0
ii
ii
0
D
4
2014
0
jj
jj
1
D
4
2015
0
jj
jj
0
The logic behind is the following:
- If firstyear = 1 then want = 1;
- If firstyear = 1, save = {companies}. That is, save include all companies when firstyear = 1. (Is it possible to save multiple variables into the save variable?)
- If firstyear ne 0 and company = lag(save), then save = lag(save) and want = 0;
else if the company in save then save = lag(save) and want = 0;
else if, among the same year observations, there is an observation whose company is equal to the current observation's save variable, then save = lag(save) and want = 0;
else save = code and want = 1;
My attempt is the following:
data have; input Person $ Person_num Year Firstyear Company $; datalines; A 1 2005 1 aa A 1 2006 0 aa A 1 2007 0 aa A 1 2008 0 aa A 1 2008 0 bb A 1 2009 0 aa A 1 2009 0 bb B 2 2008 1 cc B 2 2009 0 dd B 2 2009 0 cc B 2 2010 0 dd B 2 2011 0 dd B 2 2012 0 dd C 3 2002 1 ee C 3 2002 1 ff C 3 2003 0 ee C 3 2003 0 ff C 3 2004 0 ff C 3 2008 0 ff C 3 2009 0 ff C 3 2010 0 ff C 3 2011 0 ff C 3 2012 0 ff C 3 2012 0 gg C 3 2013 0 ff C 3 2014 0 ff D 4 2005 1 hh D 4 2006 0 hh D 4 2010 0 ii D 4 2011 0 ii D 4 2012 0 ii D 4 2014 0 jj D 4 2015 0 jj ; data want; set have; do i = 1 to 4; if person_num = i then do; if firstyear = 1 then want = 1 and save = code; *How can assign multiple variables into the save variable when Person = C; if firstyear ne 0 and company = lag(save) then save = lag(save) and want = 0; else if company in save then save = lag(save) and want = 0; else if; *I have no idea in this part.; else save = code and want = 1; end; end; run;
Is there a way to reflect my logic in SAS?
If you know the way, please let me know.
Thank you for your help.
1. What's the CODE variable?
A) code = company; Sorry for your uncomfortableness. (I corrected it.)
2. Please explain how you derive the last 2 WANT with value=1.
A)
In case of [D 4 2010 0 ii], code(=ii) is different with lag(save)(=hh) and is not included in lag(save). Also, there is no observation which has code(=hh) in 2010 because of single-year observation in 2010. Therefore, by the logic 3, the want variable of the corresponding observation has the value of 1.
Actually, [D 4 2014 0 jj] case is same as the aforementioned case.
code(=jj) is different with lag(save)(=ii) and not included in lag(save). Also, there is no observation which has code(=ii) in 2014 because of single-year observation in 2014. Therefore, by the logic 3, the want variable of the corresponding observation has the value of 1.
3. Please ensure that you provide test data and results covering all the test cases you mention.
A) Test data is the uploaded dataset and the result is also the uploaded expected result.
Thanks for your help!!
> Test data is the uploaded dataset and the result is also the uploaded expected result.
You seem to have 4 possible outcomes:
Which values fall into which outcomes?
Thanks for your comments. I revised some points. Please check the following:
Person | Person_num | Year | Firstyear | Company | Year_save | Previous_year_save | Save | Lag_save | Want | Condition | ||
A | 1 | 2005 | 1 | aa | aa | . | aa | . | 1 | 0 | ||
A | 1 | 2006 | 0 | aa | aa | aa | aa | aa | 0 | 1 | ||
A | 1 | 2007 | 0 | aa | aa | aa | aa | aa | 0 | 1 | ||
A | 1 | 2008 | 0 | aa | {aa, bb} | aa | aa | aa | 0 | 4 | ||
A | 1 | 2008 | 0 | bb | {aa, bb} | aa | aa | aa | 0 | 4 | ||
A | 1 | 2009 | 0 | aa | {aa, bb} | {aa, bb} | aa | aa | 0 | 1 | ||
A | 1 | 2009 | 0 | bb | {aa, bb} | {aa, bb} | aa | aa | 0 | 1 | ||
B | 2 | 2008 | 1 | cc | cc | . | cc | . | 1 | 0 | ||
B | 2 | 2009 | 0 | dd | {cc, dd} | cc | cc | cc | 0 | 4 | ||
B | 2 | 2009 | 0 | cc | {cc, dd} | cc | cc | cc | 0 | 4 | ||
B | 2 | 2010 | 0 | dd | dd | {cc, dd} | dd | cc | 1 | 2 | ||
B | 2 | 2011 | 0 | dd | dd | dd | dd | dd | 0 | 1 | ||
B | 2 | 2012 | 0 | dd | dd | dd | dd | dd | 0 | 1 | ||
C | 3 | 2002 | 1 | ee | {ee, ff} | . | {ee, ff} | . | 1 | 0 | ||
C | 3 | 2002 | 1 | ff | {ee, ff} | . | {ee, ff} | . | 1 | 0 | ||
C | 3 | 2003 | 0 | ee | {ee, ff} | {ee, ff} | {ee, ff} | {ee, ff} | 0 | 1 | ||
C | 3 | 2003 | 0 | ff | {ee, ff} | {ee, ff} | {ee, ff} | {ee, ff} | 0 | 1 | ||
C | 3 | 2004 | 0 | ff | ff | {ee, ff} | ff | {ee, ff} | 0 | 3 | ||
C | 3 | 2008 | 0 | ff | ff | ff | ff | ff | 0 | 1 | ||
C | 3 | 2009 | 0 | ff | ff | ff | ff | ff | 0 | 1 | ||
C | 3 | 2010 | 0 | ff | ff | ff | ff | ff | 0 | 1 | ||
C | 3 | 2011 | 0 | ff | ff | ff | ff | ff | 0 | 1 | ||
C | 3 | 2012 | 0 | ff | {ff, gg} | ff | ff | ff | 0 | 4 | ||
C | 3 | 2012 | 0 | gg | {ff, gg} | ff | ff | ff | 0 | 4 | ||
C | 3 | 2013 | 0 | ff | ff | {ff, gg} | ff | ff | 0 | 3 | ||
C | 3 | 2014 | 0 | ff | ff | ff | ff | ff | 0 | 1 | ||
D | 4 | 2005 | 1 | hh | hh | . | hh | . | 1 | 0 | ||
D | 4 | 2006 | 0 | hh | hh | hh | hh | hh | 0 | 1 | ||
D | 4 | 2010 | 0 | ii | ii | hh | ii | hh | 1 | 5 | ||
D | 4 | 2011 | 0 | ii | ii | ii | ii | ii | 0 | 1 | ||
D | 4 | 2012 | 0 | ii | ii | ii | ii | ii | 0 | 1 | ||
D | 4 | 2014 | 0 | jj | jj | ii | jj | ii | 1 | 5 | ||
D | 4 | 2015 | 0 | jj | jj | jj | jj | jj | 0 | 1 |
“Year_save” includes all companies in each year.
Previous_year_save represents the previous “Year_save” value.
(“Previous” means the closest past year observation in the same “Person”.)
“Lag_save” is the lag1 variable of “Save”.
Want variable is a dummy variable.
Condition represents the logic below corresponding to the “want” value.
First of all, I want to create "Year_save" and "Previous_year_save" value. Is there an easy way?
Thanks.
Want | Condition | |
1 | 0 | |
0 | 1 | |
0 | 1 | |
0 | 4 | |
0 | 4 | |
0 | 1 | |
0 | 1 | |
1 | 0 | |
0 | 4 | |
0 | 4 | |
1 | 2 | |
0 | 1 | |
0 | 1 | |
1 | 0 | |
1 | 0 | |
0 | 1 | |
0 | 1 | |
0 | 3 | |
0 | 1 | |
0 | 1 | |
0 | 1 | |
0 | 1 | |
0 | 4 | |
0 | 4 | |
0 | 3 | |
0 | 1 | |
1 | 0 | |
0 | 1 | |
1 | 5 | |
0 | 1 | |
0 | 1 | |
1 | 5 | |
0 | 1 |
Your question makes no sense at all.
Person Person_num Year Firstyear Company Year_save Previous_year_save Save Lag_save Want Condition A 1 2005 1 aa aa . aa . 1 0 A 1 2006 0 aa aa aa aa aa 0 1 A 1 2007 0 aa aa aa aa aa 0 1
1. How are these lines for condition 1, when condition 1 is for FIRSTYEAR=1 ?
2. Condition 4 is for WANT = . yet this variable is never missing in your example.
3. Sorry too much confusion and time spent already, I am out.
4. First of all, I want to create "Year_save" and "Previous_year_save" value.
Look at the lag() function - and make sure to never use it in IF blocks.
Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.