hi,
lastname and firstname have missing values also include a lot of characters such as: ? [ ] | : ( ) ' . - and i need to compress them
i know #1 wont do what i need but i did run it out of curousity and got:
#1 Table Y_TMI.TMI_KILL created, with 500475 rows and 7 columns.
#2 and #3 would do what i need but i am getting different results:
#2 Table Y_TMI.TMI_KILL created, with 500058 rows and 7 columns.
#3 Table Y_TMI.TMI_KILL created, with 500100 rows and 7 columns.
1.What is causing the different results in the 3 runs?
2.If i am to chose between #2 and #3 which one seems to be more accurate ?
anyone please? thx
proc sql;
create table y_tmi.tmi_kill as
select distinct
compress(dg.EmailAdd) as EmailAdd,
/*#1*/ /*cats(upcase(strip(compress(dg.FirstName))), upcase(strip(compress(dg.LastName))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,*/
/*#2*/ /*cats(upcase(strip(compress(dg.FirstName,,"ka"))), upcase(strip(compress(dg.LastName,,"ka"))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,*/
/*#3*/ /*cats(upcase(strip(compress(dg.FirstName,"?[]|:()'.- "))), upcase(strip(compress(dg.LastName,"?[]|:()'.- "))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,
from y_tmi.tmi_data_growth dg
left join y_cus.ALLIANZ a
on a.CUSTNUM = dg.CustNum
left join (select matchkey, contactid from y_cus.gi union select matchkey, contactid from y_cus.gi_driver) gi
on a.matchkey = gi.matchkey
left join y_gi.gi_contactid_xref xref
on gi.contactid = xref.cd_contact_id;
quit;
run;
On #1 compress removes only spaces.
On #2 compress keeps only alphnumeric characters
On #3 compress removes special characters defined within the command
Each of them get different values.
When you use select distinct - you get different number of observations
as a result of above. in #3 you haven't define all possible non alphanumeric characters,
therefore you get some in between count of obervations: #1 < #3 < #2
On #1 compress removes only spaces.
On #2 compress keeps only alphnumeric characters
On #3 compress removes special characters defined within the command
Each of them get different values.
When you use select distinct - you get different number of observations
as a result of above. in #3 you haven't define all possible non alphanumeric characters,
therefore you get some in between count of obervations: #1 < #3 < #2
ah right.
forgot about the "distinct"
Thanks Shmuel
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.