Condition statement, making code more efficient

Accepted Solution Solved
Reply
Occasional Contributor KG2
Occasional Contributor
Posts: 18
Accepted Solution

Condition statement, making code more efficient

[ Edited ]

Hello,

 

I am building a program to clean my data for analysis. I write a lot of repetitive statements and I wanted to learn how best to make my code concise.

 

For example: I have a variable ETR2011, ETR2012, ETR2013... to 2015 which is the "effective tax rate" each year.

 

I want to drop the enterprise from my data if any of these years shows a null value for ETR.

 

I have written:

 

data ETR_developer;

set ETR_developer;

if ETR2011 =. then delete;

if ETR2012 =. then delete;

if ETR2013 =. then delete;

if ETR2014 =. then delete;

if ETR2015 =. then delete;

run;

 

Any suggestions on a macro or a loop, I'm really new, to make this one line?

 

I've done the same for generating the variable:

 

ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);
format ETR2011 4.2;
ETR2012= ((C92300_2012+C92400_2012)/C58460_2012);
format ETR2012 4.2;
ETR2013= ((C92300_2013+C92400_2013)/C58460_2013);
format ETR2013 4.2;
ETR2014= ((C92300_2014+C92400_2014)/C58460_2014);
format ETR2014 4.2;
ETR2015= ((C92300_2015+C92400_2015)/C58460_2015);
format ETR2015 4.2;

 

It may be simple, but I can use this logic in the future! Thanks

 


Accepted Solutions
Solution
‎03-13-2018 11:40 AM
Super User
Posts: 23,661

Re: Condition statement, making code more efficient

1. Use variable lists

2. Use functions

 

if ETR2011 =. then delete;
if ETR2012 =. then delete;
if ETR2013 =. then delete;
if ETR2014 =. then delete;
if ETR2015 =. then delete;

This is equivalent to saying if any value is missing then delete. Instead, use the NMISS() function to count the number of missing and if any are missing then delete the row.

 

if nmiss(of etr2011-etr2015) >0 then delete;

For your other calculations look at using arrays instead. 


@KG2 wrote:

Hello,

 

I am building a program to clean my data for analysis. I write a lot of repetitive statements and I wanted to learn how best to make my code concise.

 

For example: I have a variable ETR2011, ETR2012, ETR2013... to 2015 which is the "effective tax rate" each year.

 

I want to drop the enterprise from my data if any of these years shows a null value for ETR.

 

I have written:

 

data ETR_developer;

set ETR_developer;

if ETR2011 =. then delete;

if ETR2012 =. then delete;

if ETR2013 =. then delete;

if ETR2014 =. then delete;

if ETR2015 =. then delete;

run;

 

Any suggestions on a macro or a loop, I'm really new, to make this one line?

 

I've done the same for generating the variable:

 

ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);
format ETR2011 4.2;
ETR2012= ((C92300_2012+C92400_2012)/C58460_2012);
format ETR2012 4.2;
ETR2013= ((C92300_2013+C92400_2013)/C58460_2013);
format ETR2013 4.2;
ETR2014= ((C92300_2014+C92400_2014)/C58460_2014);
format ETR2014 4.2;
ETR2015= ((C92300_2015+C92400_2015)/C58460_2015);
format ETR2015 4.2;

 

It may be simple, but I can use this logic in the future! Thanks

 


 

View solution in original post


All Replies
Solution
‎03-13-2018 11:40 AM
Super User
Posts: 23,661

Re: Condition statement, making code more efficient

1. Use variable lists

2. Use functions

 

if ETR2011 =. then delete;
if ETR2012 =. then delete;
if ETR2013 =. then delete;
if ETR2014 =. then delete;
if ETR2015 =. then delete;

This is equivalent to saying if any value is missing then delete. Instead, use the NMISS() function to count the number of missing and if any are missing then delete the row.

 

if nmiss(of etr2011-etr2015) >0 then delete;

For your other calculations look at using arrays instead. 


@KG2 wrote:

Hello,

 

I am building a program to clean my data for analysis. I write a lot of repetitive statements and I wanted to learn how best to make my code concise.

 

For example: I have a variable ETR2011, ETR2012, ETR2013... to 2015 which is the "effective tax rate" each year.

 

I want to drop the enterprise from my data if any of these years shows a null value for ETR.

 

I have written:

 

data ETR_developer;

set ETR_developer;

if ETR2011 =. then delete;

if ETR2012 =. then delete;

if ETR2013 =. then delete;

if ETR2014 =. then delete;

if ETR2015 =. then delete;

run;

 

Any suggestions on a macro or a loop, I'm really new, to make this one line?

 

I've done the same for generating the variable:

 

ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);
format ETR2011 4.2;
ETR2012= ((C92300_2012+C92400_2012)/C58460_2012);
format ETR2012 4.2;
ETR2013= ((C92300_2013+C92400_2013)/C58460_2013);
format ETR2013 4.2;
ETR2014= ((C92300_2014+C92400_2014)/C58460_2014);
format ETR2014 4.2;
ETR2015= ((C92300_2015+C92400_2015)/C58460_2015);
format ETR2015 4.2;

 

It may be simple, but I can use this logic in the future! Thanks

 


 

Occasional Contributor KG2
Occasional Contributor
Posts: 18

Re: Condition statement, making code more efficient

Thank you!

Super User
Super User
Posts: 9,599

Re: Condition statement, making code more efficient

"I write a lot of repetitive statements and I wanted to learn how best to make my code concise." - data modelling is the first thing to learn.  Transposed data is rarely the best method of storing and processing data in code, it is fine for review but that is really the only time it is better.  Look at what you have you have:
You firstly have a "year" element.

You then have a "number" associated with each year

(Not including any other variables as can't see those).

So the model would be:

YEAR   VALUE

 

This simplifies things a lot as variables never change, just rows.

Super User
Posts: 23,661

Re: Condition statement, making code more efficient

If you're interested in learning how to best structure your data I recommend the Hadley Wickham Tidy Data paper.

https://www.jstatsoft.org/article/view/v059i10/v59i10.pdf

Occasional Contributor KG2
Occasional Contributor
Posts: 18

Re: Condition statement, making code more efficient

Thank you - this is so helpful and I am absolutely going to be studying this.

 

Super User
Posts: 13,498

Re: Condition statement, making code more efficient


@KG2 wrote:

Hello,

 

I am building a program to clean my data for analysis. I write a lot of repetitive statements and I wanted to learn how best to make my code concise.

 

For example: I have a variable ETR2011, ETR2012, ETR2013... to 2015 which is the "effective tax rate" each year.

 

I want to drop the enterprise from my data if any of these years shows a null value for ETR.

 

I have written:

 

data ETR_developer;

set ETR_developer;

if ETR2011 =. then delete;

if ETR2012 =. then delete;

if ETR2013 =. then delete;

if ETR2014 =. then delete;

if ETR2015 =. then delete;

run;

 

Any suggestions on a macro or a loop, I'm really new, to make this one line?

 

I've done the same for generating the variable:

 

ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);
format ETR2011 4.2;
ETR2012= ((C92300_2012+C92400_2012)/C58460_2012);
format ETR2012 4.2;
ETR2013= ((C92300_2013+C92400_2013)/C58460_2013);
format ETR2013 4.2;
ETR2014= ((C92300_2014+C92400_2014)/C58460_2014);
format ETR2014 4.2;
ETR2015= ((C92300_2015+C92400_2015)/C58460_2015);
format ETR2015 4.2;

 

It may be simple, but I can use this logic in the future! Thanks

 


Code like this is one reason you will see multiple comments about "data should not exist in your variable name" and that data that looks like

Id etr2011 etr2012 etr2013 etr2014 c9230_2011 c9230_2012  c9230_2013  c9230_2014

should be

id year etr c9230;

or similar.

Then instead of multiple lines like:

ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);

which likely need to be added each year

you would have

ETR = ((C92300+C92400)/C58460);

and the code would not need to change each year.

The delete code would be simple:

if missing(etr) then delete; or

if etr=. then delete;

Multiyear Reports would easily be created with the YEAR variable providing a grouping role.

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 6 replies
  • 128 views
  • 3 likes
  • 4 in conversation