Hello,
I am building a program to clean my data for analysis. I write a lot of repetitive statements and I wanted to learn how best to make my code concise.
For example: I have a variable ETR2011, ETR2012, ETR2013... to 2015 which is the "effective tax rate" each year.
I want to drop the enterprise from my data if any of these years shows a null value for ETR.
I have written:
data ETR_developer;
set ETR_developer;
if ETR2011 =. then delete;
if ETR2012 =. then delete;
if ETR2013 =. then delete;
if ETR2014 =. then delete;
if ETR2015 =. then delete;
run;
Any suggestions on a macro or a loop, I'm really new, to make this one line?
I've done the same for generating the variable:
It may be simple, but I can use this logic in the future! Thanks
1. Use variable lists
2. Use functions
if ETR2011 =. then delete;
if ETR2012 =. then delete;
if ETR2013 =. then delete;
if ETR2014 =. then delete;
if ETR2015 =. then delete;
This is equivalent to saying if any value is missing then delete. Instead, use the NMISS() function to count the number of missing and if any are missing then delete the row.
if nmiss(of etr2011-etr2015) >0 then delete;
For your other calculations look at using arrays instead.
@KG2 wrote:
Hello,
I am building a program to clean my data for analysis. I write a lot of repetitive statements and I wanted to learn how best to make my code concise.
For example: I have a variable ETR2011, ETR2012, ETR2013... to 2015 which is the "effective tax rate" each year.
I want to drop the enterprise from my data if any of these years shows a null value for ETR.
I have written:
data ETR_developer;
set ETR_developer;
if ETR2011 =. then delete;
if ETR2012 =. then delete;
if ETR2013 =. then delete;
if ETR2014 =. then delete;
if ETR2015 =. then delete;
run;
Any suggestions on a macro or a loop, I'm really new, to make this one line?
I've done the same for generating the variable:
ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);format ETR2011 4.2;ETR2012= ((C92300_2012+C92400_2012)/C58460_2012);format ETR2012 4.2;ETR2013= ((C92300_2013+C92400_2013)/C58460_2013);format ETR2013 4.2;ETR2014= ((C92300_2014+C92400_2014)/C58460_2014);format ETR2014 4.2;ETR2015= ((C92300_2015+C92400_2015)/C58460_2015);format ETR2015 4.2;
It may be simple, but I can use this logic in the future! Thanks
1. Use variable lists
2. Use functions
if ETR2011 =. then delete;
if ETR2012 =. then delete;
if ETR2013 =. then delete;
if ETR2014 =. then delete;
if ETR2015 =. then delete;
This is equivalent to saying if any value is missing then delete. Instead, use the NMISS() function to count the number of missing and if any are missing then delete the row.
if nmiss(of etr2011-etr2015) >0 then delete;
For your other calculations look at using arrays instead.
@KG2 wrote:
Hello,
I am building a program to clean my data for analysis. I write a lot of repetitive statements and I wanted to learn how best to make my code concise.
For example: I have a variable ETR2011, ETR2012, ETR2013... to 2015 which is the "effective tax rate" each year.
I want to drop the enterprise from my data if any of these years shows a null value for ETR.
I have written:
data ETR_developer;
set ETR_developer;
if ETR2011 =. then delete;
if ETR2012 =. then delete;
if ETR2013 =. then delete;
if ETR2014 =. then delete;
if ETR2015 =. then delete;
run;
Any suggestions on a macro or a loop, I'm really new, to make this one line?
I've done the same for generating the variable:
ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);format ETR2011 4.2;ETR2012= ((C92300_2012+C92400_2012)/C58460_2012);format ETR2012 4.2;ETR2013= ((C92300_2013+C92400_2013)/C58460_2013);format ETR2013 4.2;ETR2014= ((C92300_2014+C92400_2014)/C58460_2014);format ETR2014 4.2;ETR2015= ((C92300_2015+C92400_2015)/C58460_2015);format ETR2015 4.2;
It may be simple, but I can use this logic in the future! Thanks
Thank you!
"I write a lot of repetitive statements and I wanted to learn how best to make my code concise." - data modelling is the first thing to learn. Transposed data is rarely the best method of storing and processing data in code, it is fine for review but that is really the only time it is better. Look at what you have you have:
You firstly have a "year" element.
You then have a "number" associated with each year
(Not including any other variables as can't see those).
So the model would be:
YEAR VALUE
This simplifies things a lot as variables never change, just rows.
If you're interested in learning how to best structure your data I recommend the Hadley Wickham Tidy Data paper.
Thank you - this is so helpful and I am absolutely going to be studying this.
@KG2 wrote:
Hello,
I am building a program to clean my data for analysis. I write a lot of repetitive statements and I wanted to learn how best to make my code concise.
For example: I have a variable ETR2011, ETR2012, ETR2013... to 2015 which is the "effective tax rate" each year.
I want to drop the enterprise from my data if any of these years shows a null value for ETR.
I have written:
data ETR_developer;
set ETR_developer;
if ETR2011 =. then delete;
if ETR2012 =. then delete;
if ETR2013 =. then delete;
if ETR2014 =. then delete;
if ETR2015 =. then delete;
run;
Any suggestions on a macro or a loop, I'm really new, to make this one line?
I've done the same for generating the variable:
ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);format ETR2011 4.2;ETR2012= ((C92300_2012+C92400_2012)/C58460_2012);format ETR2012 4.2;ETR2013= ((C92300_2013+C92400_2013)/C58460_2013);format ETR2013 4.2;ETR2014= ((C92300_2014+C92400_2014)/C58460_2014);format ETR2014 4.2;ETR2015= ((C92300_2015+C92400_2015)/C58460_2015);format ETR2015 4.2;
It may be simple, but I can use this logic in the future! Thanks
Code like this is one reason you will see multiple comments about "data should not exist in your variable name" and that data that looks like
Id etr2011 etr2012 etr2013 etr2014 c9230_2011 c9230_2012 c9230_2013 c9230_2014
should be
id year etr c9230;
or similar.
Then instead of multiple lines like:
ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);
which likely need to be added each year
you would have
ETR = ((C92300+C92400)/C58460);
and the code would not need to change each year.
The delete code would be simple:
if missing(etr) then delete; or
if etr=. then delete;
Multiyear Reports would easily be created with the YEAR variable providing a grouping role.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.