BookmarkSubscribeRSS Feed
pinkyc
Calcite | Level 5

I have a dataset with about 6 million observations and this sas data step is insufficient so I was asked to run it as an sql query.  I am not familiar how to phrase if then statements in sql within a select statement as well as format variables.  The original code looks like this:

data convert_code;

format date1 date 2 mmddyy8. code $4.;

set userid.code; (this table is pulled from oracle)

if ID='X' then P='A'; else P='B';

If CAT in ('1','2','3') then CAT_group='ONE'; else CAT_GROUP='TWO';

if CAT_2 > '0' and CAT_2A >='1' then d=1; else d=0;

date1=datepart(date1);

date2=datepart(date2);

if code='3' and type_code in ('A','B','C') THEN DO;

if P_CODE in ('1','2','3') then P='1';

if P_CODE in ('4','5','6') then P='2';

end;

if code='4' and e_code in ('A') then DO;

if B_CODE in ('11','12','13') then P='3';

if B_CODE in ('14','15','16') then P='4';

end;

run;

7 REPLIES 7
Reeza
Super User

if ID='X' then P='A'; else P='B';


Becomes


Case when ID='X' then 'A'

else 'B' end as P format=$1.,


The rest is similar though you may need to nest case statements and format where appropriate.



art297
Opal | Level 21

Your code only takes about 3 seconds to run on an old machine. Why is it being considered insufficient?

Proc SQL may not do it any quicker.

ChrisNZ
Tourmaline | Level 20

Agree with Arthur, being asked to rewritte this is very odd. SQL may not be faster at all.

You need to optimise in other ways (try compressing tables for example).

SASKiwi
PROC Star

Does userid.code point to an actual Oracle table? If it does then SAS is translating your code into SQL anyway behind the scenes.

If userid.code is a SAS table extracted from Oracle first then you could combine the extraction and your code into one step and that would definitely be quicker.

Tom
Super User Tom
Super User

So 99% of the time that data step is taking is pulling the data from Oracle to SAS.  Re-coding the IF/THEN to CASE statements will not help with that.

The best thing you can do is to cut down on the amount of data you need to pull from Oracle to SAS.

Perhaps you could add a KEEP data set option to only move the variables you need?

Perhaps you could add a WHERE statement to only move the observations you need?

jakarman
Barite | Level 11

A mentioning Oracle and a SAS datastep. As Tom already stated you problem is not the datastep/SQL. Normally complex function with a datastep are faster and can doe more as SQL.

Your real problem is the data movement from oracle to SAS as the data is saved in the saswork the destination is unclear.
When the destination is SAS the goal is to minimize the needed data  (keep and where)
When the destination is back to Oracle the whole processing could be done in Oracle using implicit or explicit SQL.

With Oracle there is no support for in-database processing and publishing SAS formats. SAS(R) 9.4 In-Database Products: User's Guide, Fourth Edition   The approach for using formats with in-database is a nogo.

The only SAS fucntions being supported are of the Oracle SAS/ACCESS(R) 9.4 for Relational Databases: Reference, Sixth Edition 

The datepart (date1/date2) to be eliminiated (oracle/sas datetime difference) and only focus on that new variable code. 
The case and in of    SAS(R) 9.4 SQL Procedure User's Guide should be able to work well for a implicit one (most easy to code).

---->-- ja karman --<-----
appleorange
Obsidian | Level 7

Hi all who answered,

This is pinkyc, I no longer have access to the account due to my college discontinuing my email and can't reset my password so I apologize that I cannot tick helpful answers for this thread.  I think the issue as Tom correctly pointed out is that the table I was pulling from (userid.) didn't have 6 million observations, more like 60 million which was caused by a query that wasn't specific enough.  So I fixed that to actually 6 million and it ran fine.  Thanks everyone else too!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 3027 views
  • 1 like
  • 8 in conversation