DATA Step, Macro, Functions and more

Looking for more efficient alternative to datastep keep/drop

Accepted Solution Solved
Reply
Valued Guide
Posts: 858
Accepted Solution

Looking for more efficient alternative to datastep keep/drop

I am using the following code.  Please notice the drop= _d: and keep= _:

 

There are a large amount of columns that are affected by this, this step generates a table with 1999410 observations and 465 variables, that later needs to be sorted.  It is by far the slowest part of my process.  Write now I'm using excel, copy/paste to write out a sql union statement with an order by after, I'm going to test the performance of that in comparison. I am wondering if there is a better way.

 

 

data first_lien_combine_&monyear(drop=_dSmiley Happy;
set First_Lien_Rules_ETL_&propdate (keep= sys loan_number Run_Date reporting_month _: )
    cp_rules_&monyear (keep = sys loan_number Run_Date reporting_month _: );
run;

 

Thank You in Advance,

 

Mark
quit;

 


Accepted Solutions
Solution
‎05-20-2016 02:36 PM
Super User
Posts: 17,868

Re: Looking for more efficient alternative to datastep keep/drop

You haven't provided enough information for us to help. 

 

You're trying to append a set of tables it sounds like? 

 

The SAS code is very simple - it stacks two datasets together so there isn't much that can be optimized there. Can you presort your small files before appending? It will probably still take the same amount of time overall. 

View solution in original post


All Replies
Super User
Posts: 10,521

Re: Looking for more efficient alternative to datastep keep/drop

Since you do not show how the _d or _ variables are created it is a tad difficult to make suggestions.

 

You also don't mention what the following sort criteria could be in terms of those indeterminate variables.

Valued Guide
Posts: 858

Re: Looking for more efficient alternative to datastep keep/drop

The list of keeps is 480 of this nature:

'_131_All Loans'n,
'_132_All Loans'n,
'_132.1_All Loans'n,
'_133_All Loans'n,
'_133.2_All Loans'n,
'_133.3_All Loans'n,
'_134_All Loans'n,
'_135_All Loans'n,

 

the list of drops is df0 to df38.

 

On a side note, my union all isn't working, the error says:

 

ERROR: Ambiguous reference, column '_14.1_All loans'n is in more than one table.

 

But it's in both tables, it should be.  I copied and pasted the list from excel rather than write out all those variables.  I know it's ambiguous, that's why i'm using union all...  I'm having problems all around.

Solution
‎05-20-2016 02:36 PM
Super User
Posts: 17,868

Re: Looking for more efficient alternative to datastep keep/drop

You haven't provided enough information for us to help. 

 

You're trying to append a set of tables it sounds like? 

 

The SAS code is very simple - it stacks two datasets together so there isn't much that can be optimized there. Can you presort your small files before appending? It will probably still take the same amount of time overall. 

Valued Guide
Posts: 858

Re: Looking for more efficient alternative to datastep keep/drop

sorting both datasets before the append, then sorting after made an acceptional difference.  That is going to improve the total run time by a lot.  Thanks very much.  Have a great weekend.  I have another question within the same code but will post in another string.

 

Thanks!

Super User
Posts: 5,085

Re: Looking for more efficient alternative to datastep keep/drop

[ Edited ]

I think this is worth a try:

 

data first_lien_combine_&monyear;
set First_Lien_Rules_ETL_&propdate (keep= sys loan_number Run_Date reporting_month _: drop=_d: )
    cp_rules_&monyear (keep = sys loan_number Run_Date reporting_month _:  drop=_d: );
run;

 

The idea is that you are bringing in all the _d variables, never using them, and then dropping them.  While this will need to be tested, I think this will bring in all the _ variables except for the _d variables.

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 5 replies
  • 328 views
  • 0 likes
  • 4 in conversation