DATA Step, Macro, Functions and more

Sub Query with Data step

Reply
Regular Contributor
Posts: 212

Sub Query with Data step

Hi mates,

 

How can i transform this proc sql in data step, to check witch one is faster ?

 

PROC SQL;
	CREATE TABLE MMM2 AS
	SELECT DISTINCT A.X_ACCT_KEY,	(	SELECT B.IND_QTD_SMS
										FROM MMM B
										WHERE B.X_ACCT_KEY = A.X_ACCT_KEY AND
											  B.IND_QTD_SMS NOT IN ('0', '')
									) AS IND_SMS,
									(	SELECT B.IND_QTD_EMAIL
										FROM MMM B
										WHERE B.X_ACCT_KEY = A.X_ACCT_KEY AND
											  B.IND_QTD_EMAIL NOT IN ('0', '')
									) AS IND_EMAIL	
	FROM MMM A;
QUIT;

Thanks in Advance

Super User
Posts: 5,498

Re: Sub Query with Data step

Posted in reply to DartibaliRodrigo

I don't have the time to program this, but here are some questions to consider that will make life easier for some of the other posters.

 

Evidently you can have multiple records per X_ACCT_KEY.  In that case, what matches do you want to keep?  For example, what if the same incoming record has both IND_QTD_SMS='0' and IND_QTS_EMAIL='12345'?  What if a second record for the same X_ACCT_KEY has both IND_QTD_SMS='2458' and IND_QTS_EMAIL=' '?

Regular Contributor
Posts: 212

Re: Sub Query with Data step

[ Edited ]
Posted in reply to Astounding

I want to transform this:

 

Key           | SMS  |   Email

6456545   |    2     |     

6456545   |           |      2

 

Into this:

 

Key           | SMS  |   Email

6456545   |    2     |     2

 

 

With this proc sql i can do this, but it takes a long time to run.

I'd like to test with datastep.

 

Thanks

Super User
Posts: 19,772

Re: Sub Query with Data step

Posted in reply to DartibaliRodrigo

Given your sample data, unless you've oversimplified wouldn't it be:

 

proc sql;

create table want as

select key, max(sms) as sms, max(email) as email

from have

group by key;

quit;

 

or 

 

data want;

update have(obs=0) have;

by key;

run;

Super User
Posts: 5,498

Re: Sub Query with Data step

Reeza, the possibility of '0' in the data (which must be ignored ... see the original post) complicates things.  In the case of UPDATE, there's an obvious problem where '0' could replace a valid value.  In the case of SQL, it depends on both the operating system and what might be in the realm of valid values.  On some operating systems, '0' is lower than letters.  On some, it is greater.

Super User
Posts: 19,772

Re: Sub Query with Data step

Posted in reply to DartibaliRodrigo

Can you post sample data for testing? I'm fairly certain I understand the query but wouldn't bet on it. Especially when it can be easily tested. Cross joins are an interesting beast in a data step. 

Super User
Super User
Posts: 7,942

Re: Sub Query with Data step

Posted in reply to DartibaliRodrigo

Sorry, what is the question, we don't do contract work here?  At a breif glance, datastep each subquery for the where clause, then merge all three datasets, then sort nodupkey.

Occasional Contributor
Posts: 14

Re: Sub Query with Data step

[ Edited ]
Posted in reply to DartibaliRodrigo

I believe this will work, but without data to test it's hard to say for certain.

 

data MMM2;
	set MMM;
	keep X_ACCT_KEY IND_SMS IND_EMAIL;
	if IND_QTD_SMS IN ('0', '') then IND_SMS=''; else IND_SMS=IND_QTD_SMS;
	if IND_QTD_EMAIL IN ('0', '') then IND_EMAIL=''; else IND_EMAIL=IND_QTD_EMAIL;
run;

proc sort noduplicates; by X_ACCT_KEY IND_SMS IND_EMAIL; run;
Super User
Posts: 5,498

Re: Sub Query with Data step

Posted in reply to DartibaliRodrigo

OK, given your subsequent description of the goal you are trying to reach:

 

data mmm2;

set mmm;

by x_acct_key;

if ind_qtd_sms not in ('0', ' ') then new_sms=ind_qtd_sms;

if ind_qtd_email not in ('0', ' ') then new_email = ind_qts_email;

retain new_sms new_email;

if last.x_aact_key;

output;

drop ind_qtd_sms ind_qts_email;

rename new_sms = ind_qtd_sms new_email = ind_qts_email;

new_sms=' ';

new_email=' ';

run;

 

It requires that your data set is already sorted, but virtually any DATA step approach will require that.  If there are multiple records (not just 2) for an account key, you still end up with only one record for the account key.  And it should be MUCH faster than the SQL approach.

Regular Contributor
Posts: 212

Re: Sub Query with Data step

Posted in reply to Astounding

I solved this by creating two separated data sets:

 

DATA SMS_30; SET TEST_30D_V3 (WHERE=(CHANNEL_CD EQ '_MP') DROP=IND_QTD_EMAIL);RUN;
DATA EMAIL_30;SET TEST_30D_V3 (WHERE=(CHANNEL_CD EQ '_EM') DROP=IND_QTD_SMS);RUN;

PROC SORT DATA=TEST_30D_V3;
BY X_ACCT_KEY;
RUN;

PROC SORT DATA=SMS_30;
BY X_ACCT_KEY;
RUN;

PROC SORT DATA=EMAIL_30;
BY X_ACCT_KEY;
RUN;

DATA TESTE_EV_30D(DROP=CHANNEL_CD);
MERGE SMS_30 (IN=A DROP=MOMENTO)
EMAIL_30 (IN=B DROP=MOMENTO)
TEST_30D_V3 (IN=C KEEP=X_ACCT_KEY MOMENTO);
BY X_ACCT_KEY;
RUN;

 

Thanks in Advance

Super User
Super User
Posts: 7,942

Re: Sub Query with Data step

Posted in reply to DartibaliRodrigo

No need to shout, code runs just as well in lower case with consistent identations Smiley Tongue

Ask a Question
Discussion stats
  • 10 replies
  • 861 views
  • 6 likes
  • 5 in conversation