BookmarkSubscribeRSS Feed
cool1993
Fluorite | Level 6

Retain multiple variables and use subset within do loop

 

Hello everyone,

 

My dataset is the following:

 

Person

Person_num

Year

Firstyear

Company

A

1

2005

1

aa

A

1

2006

0

aa

A

1

2007

0

aa

A

1

2008

0

aa

A

1

2008

0

bb

A

1

2009

0

aa

A

1

2009

0

bb

B

2

2008

1

cc

B

2

2009

0

dd

B

2

2009

0

cc

B

2

2010

0

dd

B

2

2011

0

dd

B

2

2012

0

dd

C

3

2002

1

ee

C

3

2002

1

ff

C

3

2003

0

ee

C

3

2003

0

ff

C

3

2004

0

ff

C

3

2008

0

ff

C

3

2009

0

ff

C

3

2010

0

ff

C

3

2011

0

ff

C

3

2012

0

ff

C

3

2012

0

gg

C

3

2013

0

ff

C

3

2014

0

ff

D

4

2005

1

hh

D

4

2006

0

hh

D

4

2010

0

ii

D

4

2011

0

ii

D

4

2012

0

ii

D

4

2014

0

jj

D

4

2015

0

jj

 

The expected outcome is the following:

 

Person

Person_num

Year

Firstyear

Company

Save

Want

A

1

2005

1

aa

aa

1

A

1

2006

0

aa

aa

0

A

1

2007

0

aa

aa

0

A

1

2008

0

aa

aa

0

A

1

2008

0

bb

aa

0

A

1

2009

0

aa

aa

0

A

1

2009

0

bb

aa

0

B

2

2008

1

cc

cc

1

B

2

2009

0

dd

cc

0

B

2

2009

0

cc

cc

0

B

2

2010

0

dd

dd

1

B

2

2011

0

dd

dd

0

B

2

2012

0

dd

dd

0

C

3

2002

1

ee

{ee, ff}

1

C

3

2002

1

ff

{ee, ff}

1

C

3

2003

0

ee

{ee, ff}

0

C

3

2003

0

ff

{ee, ff}

0

C

3

2004

0

ff

ff

0

C

3

2008

0

ff

ff

0

C

3

2009

0

ff

ff

0

C

3

2010

0

ff

ff

0

C

3

2011

0

ff

ff

0

C

3

2012

0

ff

ff

0

C

3

2012

0

gg

ff

0

C

3

2013

0

ff

ff

0

C

3

2014

0

ff

ff

0

D

4

2005

1

hh

hh

1

D

4

2006

0

hh

hh

0

D

4

2010

0

ii

ii

1

D

4

2011

0

ii

ii

0

D

4

2012

0

ii

ii

0

D

4

2014

0

jj

jj

1

D

4

2015

0

jj

jj

0

 

The logic behind is the following:

 

  1. If firstyear = 1 then want = 1; 
  2. If firstyear = 1, save = {companies}. That is, save include all companies when firstyear = 1. (Is it possible to save multiple variables into the save variable?)
  3. If firstyear ne 0 and company = lag(save), then save = lag(save) and want = 0;

    else if the company in save then save = lag(save) and want = 0;

    else if, among the same year observations, there is an observation whose company is equal to the current observation's save variable, then save = lag(save) and want = 0;

    else save = company and want = 1;

 

My attempt is the following:

data have;
input Person $ Person_num Year Firstyear Company $;
datalines;
A	1	2005	1	aa
A	1	2006	0	aa
A	1	2007	0	aa
A	1	2008	0	aa
A	1	2008	0	bb
A	1	2009	0	aa
A	1	2009	0	bb
B	2	2008	1	cc
B	2	2009	0	dd
B	2	2009	0	cc
B	2	2010	0	dd
B	2	2011	0	dd
B	2	2012	0	dd
C	3	2002	1	ee
C	3	2002	1	ff
C	3	2003	0	ee
C	3	2003	0	ff
C	3	2004	0	ff
C	3	2008	0	ff
C	3	2009	0	ff
C	3	2010	0	ff
C	3	2011	0	ff
C	3	2012	0	ff
C	3	2012	0	gg
C	3	2013	0	ff
C	3	2014	0	ff
D	4	2005	1	hh
D	4	2006	0	hh
D	4	2010	0	ii
D	4	2011	0	ii
D	4	2012	0	ii
D	4	2014	0	jj
D	4	2015	0	jj
;

data want; set have;
do i = 1 to 4;
if person_num = i then do;
 	if firstyear = 1 then want = 1 and save = company; *How can assign multiple variables into the save variable when Person = C;
	if firstyear ne 0 and company = lag(save) then save = lag(save) and want = 0;
	else if company in save then save = lag(save) and want = 0;
	else if; *I have no idea in this part.;
	else save = company and want = 1;
end;
end;
run;

 

Is there a way to reflect my logic in SAS? 

 

If you know the way, please let me know.

 

Thank you for your help.

6 REPLIES 6
ChrisNZ
Tourmaline | Level 20

Thank you for providing a full program.

 

Your needs are unclear:

1. What's the CODE variable?

2. Please explain how you derive the last 2 WANT with value=1.

3. Please ensure that you provide test data and results covering all the test cases you mention.

 

Hello everyone,

 

My dataset is the following:

 

Person

Person_num

Year

Firstyear

Company

A

1

2005

1

aa

A

1

2006

0

aa

A

1

2007

0

aa

A

1

2008

0

aa

A

1

2008

0

bb

A

1

2009

0

aa

A

1

2009

0

bb

B

2

2008

1

cc

B

2

2009

0

dd

B

2

2009

0

cc

B

2

2010

0

dd

B

2

2011

0

dd

B

2

2012

0

dd

C

3

2002

1

ee

C

3

2002

1

ff

C

3

2003

0

ee

C

3

2003

0

ff

C

3

2004

0

ff

C

3

2008

0

ff

C

3

2009

0

ff

C

3

2010

0

ff

C

3

2011

0

ff

C

3

2012

0

ff

C

3

2012

0

gg

C

3

2013

0

ff

C

3

2014

0

ff

D

4

2005

1

hh

D

4

2006

0

hh

D

4

2010

0

ii

D

4

2011

0

ii

D

4

2012

0

ii

D

4

2014

0

jj

D

4

2015

0

jj

 

The expected outcome is the following:

 

Person

Person_num

Year

Firstyear

Company

Save

Want

A

1

2005

1

aa

aa

1

A

1

2006

0

aa

aa

0

A

1

2007

0

aa

aa

0

A

1

2008

0

aa

aa

0

A

1

2008

0

bb

aa

0

A

1

2009

0

aa

aa

0

A

1

2009

0

bb

aa

0

B

2

2008

1

cc

cc

1

B

2

2009

0

dd

cc

0

B

2

2009

0

cc

cc

0

B

2

2010

0

dd

dd

1

B

2

2011

0

dd

dd

0

B

2

2012

0

dd

dd

0

C

3

2002

1

ee

{ee, ff}

1

C

3

2002

1

ff

{ee, ff}

1

C

3

2003

0

ee

{ee, ff}

0

C

3

2003

0

ff

{ee, ff}

0

C

3

2004

0

ff

ff

0

C

3

2008

0

ff

ff

0

C

3

2009

0

ff

ff

0

C

3

2010

0

ff

ff

0

C

3

2011

0

ff

ff

0

C

3

2012

0

ff

ff

0

C

3

2012

0

gg

ff

0

C

3

2013

0

ff

ff

0

C

3

2014

0

ff

ff

0

D

4

2005

1

hh

hh

1

D

4

2006

0

hh

hh

0

D

4

2010

0

ii

ii

1

D

4

2011

0

ii

ii

0

D

4

2012

0

ii

ii

0

D

4

2014

0

jj

jj

1

D

4

2015

0

jj

jj

0

 

The logic behind is the following:

 

  1. If firstyear = 1 then want = 1; 
  2. If firstyear = 1, save = {companies}. That is, save include all companies when firstyear = 1. (Is it possible to save multiple variables into the save variable?)
  3. If firstyear ne 0 and company = lag(save), then save = lag(save) and want = 0;

    else if the company in save then save = lag(save) and want = 0;

    else if, among the same year observations, there is an observation whose company is equal to the current observation's save variable, then save = lag(save) and want = 0;

    else save = code and want = 1;

 

My attempt is the following:

data have;
input Person $ Person_num Year Firstyear Company $;
datalines;
A	1	2005	1	aa
A	1	2006	0	aa
A	1	2007	0	aa
A	1	2008	0	aa
A	1	2008	0	bb
A	1	2009	0	aa
A	1	2009	0	bb
B	2	2008	1	cc
B	2	2009	0	dd
B	2	2009	0	cc
B	2	2010	0	dd
B	2	2011	0	dd
B	2	2012	0	dd
C	3	2002	1	ee
C	3	2002	1	ff
C	3	2003	0	ee
C	3	2003	0	ff
C	3	2004	0	ff
C	3	2008	0	ff
C	3	2009	0	ff
C	3	2010	0	ff
C	3	2011	0	ff
C	3	2012	0	ff
C	3	2012	0	gg
C	3	2013	0	ff
C	3	2014	0	ff
D	4	2005	1	hh
D	4	2006	0	hh
D	4	2010	0	ii
D	4	2011	0	ii
D	4	2012	0	ii
D	4	2014	0	jj
D	4	2015	0	jj
;

data want; set have;
do i = 1 to 4;
if person_num = i then do;
 	if firstyear = 1 then want = 1 and save = code; *How can assign multiple variables into the save variable when Person = C;
	if firstyear ne 0 and company = lag(save) then save = lag(save) and want = 0;
	else if company in save then save = lag(save) and want = 0;
	else if; *I have no idea in this part.;
	else save = code and want = 1;
end;
end;
run;

 

Is there a way to reflect my logic in SAS? 

 

If you know the way, please let me know.

 

Thank you for your help.

cool1993
Fluorite | Level 6

1. What's the CODE variable?


A) code = company; Sorry for your uncomfortableness. (I corrected it.)

 

2. Please explain how you derive the last 2 WANT with value=1.

 

A)
In case of [D 4 2010 0 ii], code(=ii) is different with lag(save)(=hh) and is not included in lag(save). Also, there is no observation which has code(=hh) in 2010 because of single-year observation in 2010. Therefore, by the logic 3, the want variable of the corresponding observation has the value of 1.

Actually, [D 4 2014 0 jj] case is same as the aforementioned case.
code(=jj) is different with lag(save)(=ii) and not included in lag(save). Also, there is no observation which has code(=ii) in 2014 because of single-year observation in 2014. Therefore, by the logic 3, the want variable of the corresponding observation has the value of 1.

 

3. Please ensure that you provide test data and results covering all the test cases you mention.


A) Test data is the uploaded dataset and the result is also the uploaded expected result.

 

Thanks for your help!!

ChrisNZ
Tourmaline | Level 20

> Test data is the uploaded dataset and the result is also the uploaded expected result.

You seem to have 4 possible outcomes:

  1.  and company = lag(save), then save = lag(save) and want = 0;
  2. else if the company in save then save = lag(save) and want = 0;
  3. else if, among the same year observations, there is an observation whose company is equal to the current observation's save variable, then save = lag(save) and want = 0;
  4. else save = code and want = 1;

Which values fall into which outcomes? 

cool1993
Fluorite | Level 6

Thanks for your comments. I revised some points. Please check the following:

 

PersonPerson_numYearFirstyearCompanyYear_savePrevious_year_save SaveLag_saveWant Condition
A120051aaaa. aa.1 0
A120060aaaaaa aaaa0 1
A120070aaaaaa aaaa0 1
A120080aa{aa, bb}aa aaaa0 4
A120080bb{aa, bb}aa aaaa0 4
A120090aa{aa, bb}{aa, bb} aaaa0 1
A120090bb{aa, bb}{aa, bb} aaaa0 1
B220081cccc. cc.1 0
B220090dd{cc, dd}cc cc      cc0 4
B220090cc{cc, dd}cc cccc0 4
B220100dddd{cc, dd} ddcc1 2
B220110dddddd dddd0 1
B220120dddddd dddd0 1
C320021ee{ee, ff}. {ee, ff}.1 0
C320021ff{ee, ff}. {ee, ff}.1 0
C320030ee{ee, ff}{ee, ff} {ee, ff}{ee, ff}0 1
C320030ff{ee, ff}{ee, ff} {ee, ff}{ee, ff}0 1
C320040ffff{ee, ff} ff{ee, ff}0 3
C320080ffffff ffff0 1
C320090ffffff ffff0 1
C320100ffffff ffff0 1
C320110ffffff ffff0 1
C320120ff{ff, gg}ff ffff0 4
C320120gg{ff, gg}ff ffff0 4
C320130ffff{ff, gg} ffff0 3
C320140ffffff ffff0 1
D420051hhhh. hh.1 0
D420060hhhhhh hhhh0 1
D420100iiiihh iihh1 5
D420110iiiiii iiii0 1
D420120iiiiii iiii0 1
D420140jjjjii jjii1 5
D420150jjjjjj jjjj0 1

 

“Year_save” includes all companies in each year.

 

Previous_year_save represents the previous “Year_save” value.

(“Previous” means the closest past year observation in the same “Person”.)

 

“Lag_save” is the lag1 variable of “Save”.

 

Want variable is a dummy variable.

 

Condition represents the logic below corresponding to the “want” value.

 

  1. If firstyear = 1, then save = year_save and want = 1;
  2. If firstyear = 0 and Year_save = Previous_year_save, then save = Year_save and want = 0;
  3. If want = . and the Year_save ne Previous_year_save but Year_save is included in Previous_year_save and company is not included in lag_save, then save = Year_save and want = 1;
  4. Else if want = . and the Year_save ne Previous_year_save but Year_save is included in Previous_year_save, then save = Year_save and want = 0;
  5. If want = . and, Year_save includes Previous_year_save, then save = Previous_year_save and want = 0;
  6. If want = . then save = code and want = 1;

 

First of all, I want to create "Year_save" and "Previous_year_save" value. Is there an easy way?

 

Thanks.

cool1993
Fluorite | Level 6
Want Condition
1 0
0 1
0 1
0 4
0 4
0 1
0 1
1 0
0 4
0 4
1 2
0 1
0 1
1 0
1 0
0 1
0 1
0 3
0 1
0 1
0 1
0 1
0 4
0 4
0 3
0 1
1 0
0 1
1 5
0 1
0 1
1 5
0 1
ChrisNZ
Tourmaline | Level 20

Your question makes no sense at all.

Person Person_num Year Firstyear Company Year_save Previous_year_save   Save Lag_save Want   Condition
A 1 2005 1 aa aa .   aa . 1   0
A 1 2006 0 aa aa aa   aa aa 0   1
A 1 2007 0 aa aa aa   aa aa 0   1

 

1. How are these lines for condition 1, when condition 1 is for FIRSTYEAR=1 ?

 

2. Condition 4 is for WANT = .  yet this variable is never missing in your example.

 

3. Sorry too much confusion and time spent already, I am out.

 

4. First of all, I want to create "Year_save" and "Previous_year_save" value. 

Look at the lag() function - and make sure to never use it in IF blocks.

 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1413 views
  • 0 likes
  • 2 in conversation