09-28-2016 09:25 AM
How can i transform this proc sql in data step, to check witch one is faster ?
PROC SQL; CREATE TABLE MMM2 AS SELECT DISTINCT A.X_ACCT_KEY, ( SELECT B.IND_QTD_SMS FROM MMM B WHERE B.X_ACCT_KEY = A.X_ACCT_KEY AND B.IND_QTD_SMS NOT IN ('0', '') ) AS IND_SMS, ( SELECT B.IND_QTD_EMAIL FROM MMM B WHERE B.X_ACCT_KEY = A.X_ACCT_KEY AND B.IND_QTD_EMAIL NOT IN ('0', '') ) AS IND_EMAIL FROM MMM A; QUIT;
Thanks in Advance
09-28-2016 09:38 AM
I don't have the time to program this, but here are some questions to consider that will make life easier for some of the other posters.
Evidently you can have multiple records per X_ACCT_KEY. In that case, what matches do you want to keep? For example, what if the same incoming record has both IND_QTD_SMS='0' and IND_QTS_EMAIL='12345'? What if a second record for the same X_ACCT_KEY has both IND_QTD_SMS='2458' and IND_QTS_EMAIL=' '?
09-28-2016 09:45 AM - edited 09-28-2016 09:46 AM
I want to transform this:
Key | SMS | Email
6456545 | 2 |
6456545 | | 2
Key | SMS | Email
6456545 | 2 | 2
With this proc sql i can do this, but it takes a long time to run.
I'd like to test with datastep.
09-28-2016 03:44 PM
Given your sample data, unless you've oversimplified wouldn't it be:
create table want as
select key, max(sms) as sms, max(email) as email
group by key;
update have(obs=0) have;
09-28-2016 04:12 PM
Reeza, the possibility of '0' in the data (which must be ignored ... see the original post) complicates things. In the case of UPDATE, there's an obvious problem where '0' could replace a valid value. In the case of SQL, it depends on both the operating system and what might be in the realm of valid values. On some operating systems, '0' is lower than letters. On some, it is greater.
09-28-2016 09:44 AM
Can you post sample data for testing? I'm fairly certain I understand the query but wouldn't bet on it. Especially when it can be easily tested. Cross joins are an interesting beast in a data step.
09-28-2016 09:48 AM
Sorry, what is the question, we don't do contract work here? At a breif glance, datastep each subquery for the where clause, then merge all three datasets, then sort nodupkey.
09-28-2016 10:36 AM - edited 09-28-2016 02:02 PM
I believe this will work, but without data to test it's hard to say for certain.
data MMM2; set MMM; keep X_ACCT_KEY IND_SMS IND_EMAIL; if IND_QTD_SMS IN ('0', '') then IND_SMS=''; else IND_SMS=IND_QTD_SMS; if IND_QTD_EMAIL IN ('0', '') then IND_EMAIL=''; else IND_EMAIL=IND_QTD_EMAIL; run; proc sort noduplicates; by X_ACCT_KEY IND_SMS IND_EMAIL; run;
09-28-2016 10:44 AM
OK, given your subsequent description of the goal you are trying to reach:
if ind_qtd_sms not in ('0', ' ') then new_sms=ind_qtd_sms;
if ind_qtd_email not in ('0', ' ') then new_email = ind_qts_email;
retain new_sms new_email;
drop ind_qtd_sms ind_qts_email;
rename new_sms = ind_qtd_sms new_email = ind_qts_email;
It requires that your data set is already sorted, but virtually any DATA step approach will require that. If there are multiple records (not just 2) for an account key, you still end up with only one record for the account key. And it should be MUCH faster than the SQL approach.
09-30-2016 10:44 AM
I solved this by creating two separated data sets:
DATA SMS_30; SET TEST_30D_V3 (WHERE=(CHANNEL_CD EQ '_MP') DROP=IND_QTD_EMAIL);RUN;
DATA EMAIL_30;SET TEST_30D_V3 (WHERE=(CHANNEL_CD EQ '_EM') DROP=IND_QTD_SMS);RUN;
PROC SORT DATA=TEST_30D_V3;
PROC SORT DATA=SMS_30;
PROC SORT DATA=EMAIL_30;
MERGE SMS_30 (IN=A DROP=MOMENTO)
EMAIL_30 (IN=B DROP=MOMENTO)
TEST_30D_V3 (IN=C KEEP=X_ACCT_KEY MOMENTO);
Thanks in Advance