BookmarkSubscribeRSS Feed
DartRodrigo
Lapis Lazuli | Level 10

Hi mates,

 

How can i transform this proc sql in data step, to check witch one is faster ?

 

PROC SQL;
	CREATE TABLE MMM2 AS
	SELECT DISTINCT A.X_ACCT_KEY,	(	SELECT B.IND_QTD_SMS
										FROM MMM B
										WHERE B.X_ACCT_KEY = A.X_ACCT_KEY AND
											  B.IND_QTD_SMS NOT IN ('0', '')
									) AS IND_SMS,
									(	SELECT B.IND_QTD_EMAIL
										FROM MMM B
										WHERE B.X_ACCT_KEY = A.X_ACCT_KEY AND
											  B.IND_QTD_EMAIL NOT IN ('0', '')
									) AS IND_EMAIL	
	FROM MMM A;
QUIT;

Thanks in Advance

10 REPLIES 10
Astounding
PROC Star

I don't have the time to program this, but here are some questions to consider that will make life easier for some of the other posters.

 

Evidently you can have multiple records per X_ACCT_KEY.  In that case, what matches do you want to keep?  For example, what if the same incoming record has both IND_QTD_SMS='0' and IND_QTS_EMAIL='12345'?  What if a second record for the same X_ACCT_KEY has both IND_QTD_SMS='2458' and IND_QTS_EMAIL=' '?

DartRodrigo
Lapis Lazuli | Level 10

I want to transform this:

 

Key           | SMS  |   Email

6456545   |    2     |     

6456545   |           |      2

 

Into this:

 

Key           | SMS  |   Email

6456545   |    2     |     2

 

 

With this proc sql i can do this, but it takes a long time to run.

I'd like to test with datastep.

 

Thanks

Reeza
Super User

Given your sample data, unless you've oversimplified wouldn't it be:

 

proc sql;

create table want as

select key, max(sms) as sms, max(email) as email

from have

group by key;

quit;

 

or 

 

data want;

update have(obs=0) have;

by key;

run;

Astounding
PROC Star

Reeza, the possibility of '0' in the data (which must be ignored ... see the original post) complicates things.  In the case of UPDATE, there's an obvious problem where '0' could replace a valid value.  In the case of SQL, it depends on both the operating system and what might be in the realm of valid values.  On some operating systems, '0' is lower than letters.  On some, it is greater.

Reeza
Super User

Can you post sample data for testing? I'm fairly certain I understand the query but wouldn't bet on it. Especially when it can be easily tested. Cross joins are an interesting beast in a data step. 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorry, what is the question, we don't do contract work here?  At a breif glance, datastep each subquery for the where clause, then merge all three datasets, then sort nodupkey.

titus
Obsidian | Level 7

I believe this will work, but without data to test it's hard to say for certain.

 

data MMM2;
	set MMM;
	keep X_ACCT_KEY IND_SMS IND_EMAIL;
	if IND_QTD_SMS IN ('0', '') then IND_SMS=''; else IND_SMS=IND_QTD_SMS;
	if IND_QTD_EMAIL IN ('0', '') then IND_EMAIL=''; else IND_EMAIL=IND_QTD_EMAIL;
run;

proc sort noduplicates; by X_ACCT_KEY IND_SMS IND_EMAIL; run;
Astounding
PROC Star

OK, given your subsequent description of the goal you are trying to reach:

 

data mmm2;

set mmm;

by x_acct_key;

if ind_qtd_sms not in ('0', ' ') then new_sms=ind_qtd_sms;

if ind_qtd_email not in ('0', ' ') then new_email = ind_qts_email;

retain new_sms new_email;

if last.x_aact_key;

output;

drop ind_qtd_sms ind_qts_email;

rename new_sms = ind_qtd_sms new_email = ind_qts_email;

new_sms=' ';

new_email=' ';

run;

 

It requires that your data set is already sorted, but virtually any DATA step approach will require that.  If there are multiple records (not just 2) for an account key, you still end up with only one record for the account key.  And it should be MUCH faster than the SQL approach.

DartRodrigo
Lapis Lazuli | Level 10

I solved this by creating two separated data sets:

 

DATA SMS_30; SET TEST_30D_V3 (WHERE=(CHANNEL_CD EQ '_MP') DROP=IND_QTD_EMAIL);RUN;
DATA EMAIL_30;SET TEST_30D_V3 (WHERE=(CHANNEL_CD EQ '_EM') DROP=IND_QTD_SMS);RUN;

PROC SORT DATA=TEST_30D_V3;
BY X_ACCT_KEY;
RUN;

PROC SORT DATA=SMS_30;
BY X_ACCT_KEY;
RUN;

PROC SORT DATA=EMAIL_30;
BY X_ACCT_KEY;
RUN;

DATA TESTE_EV_30D(DROP=CHANNEL_CD);
MERGE SMS_30 (IN=A DROP=MOMENTO)
EMAIL_30 (IN=B DROP=MOMENTO)
TEST_30D_V3 (IN=C KEEP=X_ACCT_KEY MOMENTO);
BY X_ACCT_KEY;
RUN;

 

Thanks in Advance

RW9
Diamond | Level 26 RW9
Diamond | Level 26

No need to shout, code runs just as well in lower case with consistent identations Smiley Tongue

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 4614 views
  • 6 likes
  • 5 in conversation