BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi SAS Expert,

 

I have a dataset named "treatment_sample" as attached. This dataset including 3 countries (Argentina, Australia, and Austria). A quick description as below:

ENAME GEOGN     YEAR acc_pay ACC_PAY_TUR   ACC_STA         LAGS18
MOQ AUSTRALIA 2002 2317 6.0917808219 Local standards 3.71
MOQ AUSTRALIA 2004 781 13.70323168 Local standards 4.68
MOQ AUSTRALIA 2005 1585 11.488588335 Local standards 4.58
MOQ AUSTRALIA 2007 881 6.4921190893 IFRS 2.68
MOQ AUSTRALIA 2008 893 4.3393461105 IFRS 1.78

Year is a character variable.

Data of Australia is in years  2001, 2002 and from 2004 to 2008

Data of Austria is in years    2004, 2005 and from 2007 to 2011

Data of Belgium is in years  2002, 2003 and from 2005 to 2009

Data of Brazil is in years      1998,1999 and from 2001 to 2005

 

I want to create two variables:

Variable x=1 for every observation in the last five years of each country. For example, all observations in years from 2004 to 2008 in Australia will get the value 1. Or observations in years from 2007 to 2011 in Austria will get the value 1. Otherwise, x get value 0.

 

Another question is to create a variable y equals 0 for every observation in the whole dataset.

 

Many thanks and warm regards.

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
1 ACCEPTED SOLUTION

Accepted Solutions
qoit
Pyrite | Level 9

Hi, does the below help using an example dataset:

 

data have;
	input ename :$3. geogn :$15. year $ acc_pay acc_pay_tur acc_sta :$15. lags18;
	infile datalines dlm = "," dsd;
	num_year = input(year,best12.);
	datalines;
MOQ,BELGIUM,2018,4122,6.0917808219,Local standards,2.33
MOQ,BELGIUM,2019,1231,6.0917808219,Local standards,14.22
MOQ,BELGIUM,2010,4122,6.0917808219,Local standards,1.21
MOQ,BELGIUM,2011,1231,6.0917808219,Local standards,10.71
MOQ,BELGIUM,2014,2317,6.0917808219,Local standards,3.71
MOQ,BELGIUM,2015,781,13.70323168,Local standards,4.68
MOQ,AUSTRALIA,1999,4122,6.0917808219,Local standards,1.21
MOQ,AUSTRALIA,2000,1231,6.0917808219,Local standards,10.71
MOQ,AUSTRALIA,2002,2317,6.0917808219,Local standards,3.71
MOQ,AUSTRALIA,2004,781,13.70323168,Local standards,4.68
MOQ,AUSTRALIA,2005,1585,11.488588335,Local standards,4.58
MOQ,AUSTRALIA,2007,881,6.4921190893,IFRS,2.68
MOQ,AUSTRALIA,2008,893,4.3393461105,IFRS,1.78
;
run;

proc means data=have noprint nway;
	class geogn;
	var num_year;
	output out=mini_have (drop=_type_ _freq_) max=/autoname;
run;

proc sort data=have;
	by ename geogn;
run;

data have2 (drop=num_:);
	merge have mini_have;
	by geogn;
	retain y 0;

	/* X variable = 1 if it's within last 5 years of that particular Country*/
	if num_year_max - num_year < 5 then
		x=1;
	else x = 0;
run;

View solution in original post

9 REPLIES 9
PaigeMiller
Diamond | Level 26

I want to create two variables:

Variable x=1 for every observation in the last five years of each country. For example, all observations in years from 2004 to 2008 in Australia will get the value 1. Or observations in years from 2007 to 2011 in Austria will get the value 1. Otherwise, x get value 0.

 

Do you have to determine the last five years from the data? Or can it be that for Australia we hard-code 2004 to 2008, and similarly hard-code the years for all the other countries? If we have to determine the last five years from the data, what should happen if the last five years in the data for a country are not consecutive?

--
Paige Miller
Phil_NZ
Barite | Level 11

Hi @PaigeMiller 

 

Thank you for your pointing, it is my bad for confusing you.


@PaigeMiller wrote:

Do you have to determine the last five years from the data? Or can it be that for Australia we hard-code 2004 to 2008, and similarly hard-code the years for all the other countries? If we have to determine the last five years from the data, what should happen if the last five years in the data for a country are not consecutive?


The last five years here is as documented in my questions then, they are different depending on the countries.

Data of Australia is in years  2001, 2002 and from 2004 to 2008

Data of Austria is in years    2004, 2005 and from 2007 to 2011

Data of Belgium is in years  2002, 2003 and from 2005 to 2009

Data of Brazil is in years      1998,1999 and from 2001 to 2005

 

 

So, in this case, the last five years of Austrialia is 2004 to 2008, of Austria is 2007 to 2011, Belgium is 2005 to 2009, and Brazil is 2001 to 2005.

 

Please let me know if I understand you improperly.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
Shmuel
Garnet | Level 18

Relating to "Another question is to create a variable y equals 0 for every observation in the whole dataset." - use next statement in a data step:

retain Y 0;

 

Shmuel
Garnet | Level 18

You are not consistent with your description. 

You wrote "This dataset including 3 countries" but you mention 4 countries in the next lines.

and please relate to @PaigeMiller 's post too.

Phil_NZ
Barite | Level 11

Hi @Shmuel 

 

Sorry for such an inconsistency, it is 4 countries.

 

Many thanks.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
maguiremq
SAS Super FREQ

It's a little difficult to figure out what is going on. Instead of providing a data set, you should write a DATALINES statement that captures all your issues. People here, including myself, will not download or open an attachment. I've included my own data set based off your information, but I don't know if I captured all the idiosyncrasies of your data set.

 

data have;
input ename :$3. geogn :$15. year $ acc_pay acc_pay_tur acc_sta :$15. lags18;
infile datalines dlm = "," dsd;
datalines;
MOQ,BELGIUM,2018,4122,6.0917808219,Local standards,2.33
MOQ,BELGIUM,2019,1231,6.0917808219,Local standards,14.22
MOQ,BELGIUM,2010,4122,6.0917808219,Local standards,1.21
MOQ,BELGIUM,2011,1231,6.0917808219,Local standards,10.71
MOQ,BELGIUM,2014,2317,6.0917808219,Local standards,3.71
MOQ,BELGIUM,2015,781,13.70323168,Local standards,4.68
MOQ,AUSTRALIA,1999,4122,6.0917808219,Local standards,1.21
MOQ,AUSTRALIA,2000,1231,6.0917808219,Local standards,10.71
MOQ,AUSTRALIA,2002,2317,6.0917808219,Local standards,3.71
MOQ,AUSTRALIA,2004,781,13.70323168,Local standards,4.68
MOQ,AUSTRALIA,2005,1585,11.488588335,Local standards,4.58
MOQ,AUSTRALIA,2007,881,6.4921190893,IFRS,2.68
MOQ,AUSTRALIA,2008,893,4.3393461105,IFRS,1.78
;
run;

proc sort data = have
	out = have_2;
		by geogn descending year; /* get the most recent year on top for each geogn */
run;

data have_3;
	set have_2;
	by geogn descending year; /* do the following for each geogn by year */
		y = 0; /* creating variable 'y' that is equal to 0 for the entire data set */
		if first.geogn then counter = 1; /* if it's the first country, then set counter = 1 */
			else counter + 1; /* Otherwise, if it's the same country, increment by 1 */
		if counter in (1:5) then x = 1; /* if the counter is in the first 5, create a flag called x and set to 1 */
			else x = 0; /* otherwise set it to 0. */
run;

Here's a helpful blog on by-group processing as well as first last processing.

 

https://blogs.sas.com/content/iml/2018/02/26/how-to-use-first-variable-and-last-variable-in-a-by-gro... 

 

Also, @Shmuel's method of using a retain statement is more efficient than what I included here.

qoit
Pyrite | Level 9

Hi, does the below help using an example dataset:

 

data have;
	input ename :$3. geogn :$15. year $ acc_pay acc_pay_tur acc_sta :$15. lags18;
	infile datalines dlm = "," dsd;
	num_year = input(year,best12.);
	datalines;
MOQ,BELGIUM,2018,4122,6.0917808219,Local standards,2.33
MOQ,BELGIUM,2019,1231,6.0917808219,Local standards,14.22
MOQ,BELGIUM,2010,4122,6.0917808219,Local standards,1.21
MOQ,BELGIUM,2011,1231,6.0917808219,Local standards,10.71
MOQ,BELGIUM,2014,2317,6.0917808219,Local standards,3.71
MOQ,BELGIUM,2015,781,13.70323168,Local standards,4.68
MOQ,AUSTRALIA,1999,4122,6.0917808219,Local standards,1.21
MOQ,AUSTRALIA,2000,1231,6.0917808219,Local standards,10.71
MOQ,AUSTRALIA,2002,2317,6.0917808219,Local standards,3.71
MOQ,AUSTRALIA,2004,781,13.70323168,Local standards,4.68
MOQ,AUSTRALIA,2005,1585,11.488588335,Local standards,4.58
MOQ,AUSTRALIA,2007,881,6.4921190893,IFRS,2.68
MOQ,AUSTRALIA,2008,893,4.3393461105,IFRS,1.78
;
run;

proc means data=have noprint nway;
	class geogn;
	var num_year;
	output out=mini_have (drop=_type_ _freq_) max=/autoname;
run;

proc sort data=have;
	by ename geogn;
run;

data have2 (drop=num_:);
	merge have mini_have;
	by geogn;
	retain y 0;

	/* X variable = 1 if it's within last 5 years of that particular Country*/
	if num_year_max - num_year < 5 then
		x=1;
	else x = 0;
run;
Phil_NZ
Barite | Level 11

Hello @qoit !

Many thanks for your answer, I am able to understand it.

I just want to know if the code below is okay, I tested successfully but want to make sure my change does not cause any hidden problem that I did not notice enough

libname sample "C:\Users\pnguyen\Desktop\New folder";


data treatment;
	set sample.treatment_sample;
   num_year = input(year,best12.);
run;



proc means data=treatment noprint nway;
	class geogn; /*I just class only by geogn because I do not know why I need to sort by ename*/
	var num_year;
	output out=mini_have (drop=_type_ _freq_) max=/autoname;
run;

proc sort data=treatment;
	by geogn;
run;

data have2 (drop=num_:);
	merge treatment mini_have;
	by geogn;
	retain y 0;

	/* X variable = 1 if it's within last 5 years of that particular Country*/
	if num_year_max - num_year < 5 then
		x=1;
	else x = 0;
run;

Many thanks and warmest regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
qoit
Pyrite | Level 9
No worries, based on what you are trying to achieve, above should definitely work! Good luck 🙂

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 817 views
  • 6 likes
  • 5 in conversation