BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
KG2
Calcite | Level 5 KG2
Calcite | Level 5

Hello,

 

I am building a program to clean my data for analysis. I write a lot of repetitive statements and I wanted to learn how best to make my code concise.

 

For example: I have a variable ETR2011, ETR2012, ETR2013... to 2015 which is the "effective tax rate" each year.

 

I want to drop the enterprise from my data if any of these years shows a null value for ETR.

 

I have written:

 

data ETR_developer;

set ETR_developer;

if ETR2011 =. then delete;

if ETR2012 =. then delete;

if ETR2013 =. then delete;

if ETR2014 =. then delete;

if ETR2015 =. then delete;

run;

 

Any suggestions on a macro or a loop, I'm really new, to make this one line?

 

I've done the same for generating the variable:

 

ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);
format ETR2011 4.2;
ETR2012= ((C92300_2012+C92400_2012)/C58460_2012);
format ETR2012 4.2;
ETR2013= ((C92300_2013+C92400_2013)/C58460_2013);
format ETR2013 4.2;
ETR2014= ((C92300_2014+C92400_2014)/C58460_2014);
format ETR2014 4.2;
ETR2015= ((C92300_2015+C92400_2015)/C58460_2015);
format ETR2015 4.2;

 

It may be simple, but I can use this logic in the future! Thanks

 

1 ACCEPTED SOLUTION

Accepted Solutions
Reeza
Super User

1. Use variable lists

2. Use functions

 

if ETR2011 =. then delete;
if ETR2012 =. then delete;
if ETR2013 =. then delete;
if ETR2014 =. then delete;
if ETR2015 =. then delete;

This is equivalent to saying if any value is missing then delete. Instead, use the NMISS() function to count the number of missing and if any are missing then delete the row.

 

if nmiss(of etr2011-etr2015) >0 then delete;

For your other calculations look at using arrays instead. 


@KG2 wrote:

Hello,

 

I am building a program to clean my data for analysis. I write a lot of repetitive statements and I wanted to learn how best to make my code concise.

 

For example: I have a variable ETR2011, ETR2012, ETR2013... to 2015 which is the "effective tax rate" each year.

 

I want to drop the enterprise from my data if any of these years shows a null value for ETR.

 

I have written:

 

data ETR_developer;

set ETR_developer;

if ETR2011 =. then delete;

if ETR2012 =. then delete;

if ETR2013 =. then delete;

if ETR2014 =. then delete;

if ETR2015 =. then delete;

run;

 

Any suggestions on a macro or a loop, I'm really new, to make this one line?

 

I've done the same for generating the variable:

 

ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);
format ETR2011 4.2;
ETR2012= ((C92300_2012+C92400_2012)/C58460_2012);
format ETR2012 4.2;
ETR2013= ((C92300_2013+C92400_2013)/C58460_2013);
format ETR2013 4.2;
ETR2014= ((C92300_2014+C92400_2014)/C58460_2014);
format ETR2014 4.2;
ETR2015= ((C92300_2015+C92400_2015)/C58460_2015);
format ETR2015 4.2;

 

It may be simple, but I can use this logic in the future! Thanks

 


 

View solution in original post

6 REPLIES 6
Reeza
Super User

1. Use variable lists

2. Use functions

 

if ETR2011 =. then delete;
if ETR2012 =. then delete;
if ETR2013 =. then delete;
if ETR2014 =. then delete;
if ETR2015 =. then delete;

This is equivalent to saying if any value is missing then delete. Instead, use the NMISS() function to count the number of missing and if any are missing then delete the row.

 

if nmiss(of etr2011-etr2015) >0 then delete;

For your other calculations look at using arrays instead. 


@KG2 wrote:

Hello,

 

I am building a program to clean my data for analysis. I write a lot of repetitive statements and I wanted to learn how best to make my code concise.

 

For example: I have a variable ETR2011, ETR2012, ETR2013... to 2015 which is the "effective tax rate" each year.

 

I want to drop the enterprise from my data if any of these years shows a null value for ETR.

 

I have written:

 

data ETR_developer;

set ETR_developer;

if ETR2011 =. then delete;

if ETR2012 =. then delete;

if ETR2013 =. then delete;

if ETR2014 =. then delete;

if ETR2015 =. then delete;

run;

 

Any suggestions on a macro or a loop, I'm really new, to make this one line?

 

I've done the same for generating the variable:

 

ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);
format ETR2011 4.2;
ETR2012= ((C92300_2012+C92400_2012)/C58460_2012);
format ETR2012 4.2;
ETR2013= ((C92300_2013+C92400_2013)/C58460_2013);
format ETR2013 4.2;
ETR2014= ((C92300_2014+C92400_2014)/C58460_2014);
format ETR2014 4.2;
ETR2015= ((C92300_2015+C92400_2015)/C58460_2015);
format ETR2015 4.2;

 

It may be simple, but I can use this logic in the future! Thanks

 


 

KG2
Calcite | Level 5 KG2
Calcite | Level 5

Thank you!

RW9
Diamond | Level 26 RW9
Diamond | Level 26

"I write a lot of repetitive statements and I wanted to learn how best to make my code concise." - data modelling is the first thing to learn.  Transposed data is rarely the best method of storing and processing data in code, it is fine for review but that is really the only time it is better.  Look at what you have you have:
You firstly have a "year" element.

You then have a "number" associated with each year

(Not including any other variables as can't see those).

So the model would be:

YEAR   VALUE

 

This simplifies things a lot as variables never change, just rows.

Reeza
Super User

If you're interested in learning how to best structure your data I recommend the Hadley Wickham Tidy Data paper.

https://www.jstatsoft.org/article/view/v059i10/v59i10.pdf

KG2
Calcite | Level 5 KG2
Calcite | Level 5

Thank you - this is so helpful and I am absolutely going to be studying this.

 

ballardw
Super User

@KG2 wrote:

Hello,

 

I am building a program to clean my data for analysis. I write a lot of repetitive statements and I wanted to learn how best to make my code concise.

 

For example: I have a variable ETR2011, ETR2012, ETR2013... to 2015 which is the "effective tax rate" each year.

 

I want to drop the enterprise from my data if any of these years shows a null value for ETR.

 

I have written:

 

data ETR_developer;

set ETR_developer;

if ETR2011 =. then delete;

if ETR2012 =. then delete;

if ETR2013 =. then delete;

if ETR2014 =. then delete;

if ETR2015 =. then delete;

run;

 

Any suggestions on a macro or a loop, I'm really new, to make this one line?

 

I've done the same for generating the variable:

 

ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);
format ETR2011 4.2;
ETR2012= ((C92300_2012+C92400_2012)/C58460_2012);
format ETR2012 4.2;
ETR2013= ((C92300_2013+C92400_2013)/C58460_2013);
format ETR2013 4.2;
ETR2014= ((C92300_2014+C92400_2014)/C58460_2014);
format ETR2014 4.2;
ETR2015= ((C92300_2015+C92400_2015)/C58460_2015);
format ETR2015 4.2;

 

It may be simple, but I can use this logic in the future! Thanks

 


Code like this is one reason you will see multiple comments about "data should not exist in your variable name" and that data that looks like

Id etr2011 etr2012 etr2013 etr2014 c9230_2011 c9230_2012  c9230_2013  c9230_2014

should be

id year etr c9230;

or similar.

Then instead of multiple lines like:

ETR2011= ((C92300_2011+C92400_2011)/C58460_2011);

which likely need to be added each year

you would have

ETR = ((C92300+C92400)/C58460);

and the code would not need to change each year.

The delete code would be simple:

if missing(etr) then delete; or

if etr=. then delete;

Multiyear Reports would easily be created with the YEAR variable providing a grouping role.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1407 views
  • 3 likes
  • 4 in conversation