Hi SAS Expert,
I have a dataset named "treatment_sample" as attached. This dataset including 3 countries (Argentina, Australia, and Austria). A quick description as below:
ENAME GEOGN YEAR acc_pay ACC_PAY_TUR ACC_STA LAGS18
MOQ AUSTRALIA 2002 2317 6.0917808219 Local standards 3.71
MOQ AUSTRALIA 2004 781 13.70323168 Local standards 4.68
MOQ AUSTRALIA 2005 1585 11.488588335 Local standards 4.58
MOQ AUSTRALIA 2007 881 6.4921190893 IFRS 2.68
MOQ AUSTRALIA 2008 893 4.3393461105 IFRS 1.78
Year is a character variable.
Data of Australia is in years 2001, 2002 and from 2004 to 2008
Data of Austria is in years 2004, 2005 and from 2007 to 2011
Data of Belgium is in years 2002, 2003 and from 2005 to 2009
Data of Brazil is in years 1998,1999 and from 2001 to 2005
I want to create two variables:
Variable x=1 for every observation in the last five years of each country. For example, all observations in years from 2004 to 2008 in Australia will get the value 1. Or observations in years from 2007 to 2011 in Austria will get the value 1. Otherwise, x get value 0.
Another question is to create a variable y equals 0 for every observation in the whole dataset.
Many thanks and warm regards.
Hi, does the below help using an example dataset:
data have;
input ename :$3. geogn :$15. year $ acc_pay acc_pay_tur acc_sta :$15. lags18;
infile datalines dlm = "," dsd;
num_year = input(year,best12.);
datalines;
MOQ,BELGIUM,2018,4122,6.0917808219,Local standards,2.33
MOQ,BELGIUM,2019,1231,6.0917808219,Local standards,14.22
MOQ,BELGIUM,2010,4122,6.0917808219,Local standards,1.21
MOQ,BELGIUM,2011,1231,6.0917808219,Local standards,10.71
MOQ,BELGIUM,2014,2317,6.0917808219,Local standards,3.71
MOQ,BELGIUM,2015,781,13.70323168,Local standards,4.68
MOQ,AUSTRALIA,1999,4122,6.0917808219,Local standards,1.21
MOQ,AUSTRALIA,2000,1231,6.0917808219,Local standards,10.71
MOQ,AUSTRALIA,2002,2317,6.0917808219,Local standards,3.71
MOQ,AUSTRALIA,2004,781,13.70323168,Local standards,4.68
MOQ,AUSTRALIA,2005,1585,11.488588335,Local standards,4.58
MOQ,AUSTRALIA,2007,881,6.4921190893,IFRS,2.68
MOQ,AUSTRALIA,2008,893,4.3393461105,IFRS,1.78
;
run;
proc means data=have noprint nway;
class geogn;
var num_year;
output out=mini_have (drop=_type_ _freq_) max=/autoname;
run;
proc sort data=have;
by ename geogn;
run;
data have2 (drop=num_:);
merge have mini_have;
by geogn;
retain y 0;
/* X variable = 1 if it's within last 5 years of that particular Country*/
if num_year_max - num_year < 5 then
x=1;
else x = 0;
run;
I want to create two variables:
Variable x=1 for every observation in the last five years of each country. For example, all observations in years from 2004 to 2008 in Australia will get the value 1. Or observations in years from 2007 to 2011 in Austria will get the value 1. Otherwise, x get value 0.
Do you have to determine the last five years from the data? Or can it be that for Australia we hard-code 2004 to 2008, and similarly hard-code the years for all the other countries? If we have to determine the last five years from the data, what should happen if the last five years in the data for a country are not consecutive?
Hi @PaigeMiller
Thank you for your pointing, it is my bad for confusing you.
@PaigeMiller wrote:
Do you have to determine the last five years from the data? Or can it be that for Australia we hard-code 2004 to 2008, and similarly hard-code the years for all the other countries? If we have to determine the last five years from the data, what should happen if the last five years in the data for a country are not consecutive?
The last five years here is as documented in my questions then, they are different depending on the countries.
Data of Australia is in years 2001, 2002 and from 2004 to 2008
Data of Austria is in years 2004, 2005 and from 2007 to 2011
Data of Belgium is in years 2002, 2003 and from 2005 to 2009
Data of Brazil is in years 1998,1999 and from 2001 to 2005
So, in this case, the last five years of Austrialia is 2004 to 2008, of Austria is 2007 to 2011, Belgium is 2005 to 2009, and Brazil is 2001 to 2005.
Please let me know if I understand you improperly.
Relating to "Another question is to create a variable y equals 0 for every observation in the whole dataset." - use next statement in a data step:
retain Y 0;
You are not consistent with your description.
You wrote "This dataset including 3 countries" but you mention 4 countries in the next lines.
and please relate to @PaigeMiller 's post too.
Hi @Shmuel
Sorry for such an inconsistency, it is 4 countries.
Many thanks.
It's a little difficult to figure out what is going on. Instead of providing a data set, you should write a DATALINES statement that captures all your issues. People here, including myself, will not download or open an attachment. I've included my own data set based off your information, but I don't know if I captured all the idiosyncrasies of your data set.
data have;
input ename :$3. geogn :$15. year $ acc_pay acc_pay_tur acc_sta :$15. lags18;
infile datalines dlm = "," dsd;
datalines;
MOQ,BELGIUM,2018,4122,6.0917808219,Local standards,2.33
MOQ,BELGIUM,2019,1231,6.0917808219,Local standards,14.22
MOQ,BELGIUM,2010,4122,6.0917808219,Local standards,1.21
MOQ,BELGIUM,2011,1231,6.0917808219,Local standards,10.71
MOQ,BELGIUM,2014,2317,6.0917808219,Local standards,3.71
MOQ,BELGIUM,2015,781,13.70323168,Local standards,4.68
MOQ,AUSTRALIA,1999,4122,6.0917808219,Local standards,1.21
MOQ,AUSTRALIA,2000,1231,6.0917808219,Local standards,10.71
MOQ,AUSTRALIA,2002,2317,6.0917808219,Local standards,3.71
MOQ,AUSTRALIA,2004,781,13.70323168,Local standards,4.68
MOQ,AUSTRALIA,2005,1585,11.488588335,Local standards,4.58
MOQ,AUSTRALIA,2007,881,6.4921190893,IFRS,2.68
MOQ,AUSTRALIA,2008,893,4.3393461105,IFRS,1.78
;
run;
proc sort data = have
out = have_2;
by geogn descending year; /* get the most recent year on top for each geogn */
run;
data have_3;
set have_2;
by geogn descending year; /* do the following for each geogn by year */
y = 0; /* creating variable 'y' that is equal to 0 for the entire data set */
if first.geogn then counter = 1; /* if it's the first country, then set counter = 1 */
else counter + 1; /* Otherwise, if it's the same country, increment by 1 */
if counter in (1:5) then x = 1; /* if the counter is in the first 5, create a flag called x and set to 1 */
else x = 0; /* otherwise set it to 0. */
run;
Here's a helpful blog on by-group processing as well as first last processing.
Also, @Shmuel's method of using a retain statement is more efficient than what I included here.
Hi, does the below help using an example dataset:
data have;
input ename :$3. geogn :$15. year $ acc_pay acc_pay_tur acc_sta :$15. lags18;
infile datalines dlm = "," dsd;
num_year = input(year,best12.);
datalines;
MOQ,BELGIUM,2018,4122,6.0917808219,Local standards,2.33
MOQ,BELGIUM,2019,1231,6.0917808219,Local standards,14.22
MOQ,BELGIUM,2010,4122,6.0917808219,Local standards,1.21
MOQ,BELGIUM,2011,1231,6.0917808219,Local standards,10.71
MOQ,BELGIUM,2014,2317,6.0917808219,Local standards,3.71
MOQ,BELGIUM,2015,781,13.70323168,Local standards,4.68
MOQ,AUSTRALIA,1999,4122,6.0917808219,Local standards,1.21
MOQ,AUSTRALIA,2000,1231,6.0917808219,Local standards,10.71
MOQ,AUSTRALIA,2002,2317,6.0917808219,Local standards,3.71
MOQ,AUSTRALIA,2004,781,13.70323168,Local standards,4.68
MOQ,AUSTRALIA,2005,1585,11.488588335,Local standards,4.58
MOQ,AUSTRALIA,2007,881,6.4921190893,IFRS,2.68
MOQ,AUSTRALIA,2008,893,4.3393461105,IFRS,1.78
;
run;
proc means data=have noprint nway;
class geogn;
var num_year;
output out=mini_have (drop=_type_ _freq_) max=/autoname;
run;
proc sort data=have;
by ename geogn;
run;
data have2 (drop=num_:);
merge have mini_have;
by geogn;
retain y 0;
/* X variable = 1 if it's within last 5 years of that particular Country*/
if num_year_max - num_year < 5 then
x=1;
else x = 0;
run;
Hello @qoit !
Many thanks for your answer, I am able to understand it.
I just want to know if the code below is okay, I tested successfully but want to make sure my change does not cause any hidden problem that I did not notice enough
libname sample "C:\Users\pnguyen\Desktop\New folder";
data treatment;
set sample.treatment_sample;
num_year = input(year,best12.);
run;
proc means data=treatment noprint nway;
class geogn; /*I just class only by geogn because I do not know why I need to sort by ename*/
var num_year;
output out=mini_have (drop=_type_ _freq_) max=/autoname;
run;
proc sort data=treatment;
by geogn;
run;
data have2 (drop=num_:);
merge treatment mini_have;
by geogn;
retain y 0;
/* X variable = 1 if it's within last 5 years of that particular Country*/
if num_year_max - num_year < 5 then
x=1;
else x = 0;
run;
Many thanks and warmest regards.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.