hi,
lastname and firstname have missing values also include a lot of characters such as: ? [ ] | : ( ) ' . - and i need to compress them
i know #1 wont do what i need but i did run it out of curousity and got:
#1 Table Y_TMI.TMI_KILL created, with 500475 rows and 7 columns.
#2 and #3 would do what i need but i am getting different results:
#2 Table Y_TMI.TMI_KILL created, with 500058 rows and 7 columns.
#3 Table Y_TMI.TMI_KILL created, with 500100 rows and 7 columns.
1.What is causing the different results in the 3 runs?
2.If i am to chose between #2 and #3 which one seems to be more accurate ?
anyone please? thx
proc sql;
create table y_tmi.tmi_kill as
select distinct
compress(dg.EmailAdd) as EmailAdd,
/*#1*/ /*cats(upcase(strip(compress(dg.FirstName))), upcase(strip(compress(dg.LastName))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,*/
/*#2*/ /*cats(upcase(strip(compress(dg.FirstName,,"ka"))), upcase(strip(compress(dg.LastName,,"ka"))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,*/
/*#3*/ /*cats(upcase(strip(compress(dg.FirstName,"?[]|:()'.- "))), upcase(strip(compress(dg.LastName,"?[]|:()'.- "))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,
from y_tmi.tmi_data_growth dg
left join y_cus.ALLIANZ a
on a.CUSTNUM = dg.CustNum
left join (select matchkey, contactid from y_cus.gi union select matchkey, contactid from y_cus.gi_driver) gi
on a.matchkey = gi.matchkey
left join y_gi.gi_contactid_xref xref
on gi.contactid = xref.cd_contact_id;
quit;
run;
On #1 compress removes only spaces.
On #2 compress keeps only alphnumeric characters
On #3 compress removes special characters defined within the command
Each of them get different values.
When you use select distinct - you get different number of observations
as a result of above. in #3 you haven't define all possible non alphanumeric characters,
therefore you get some in between count of obervations: #1 < #3 < #2
On #1 compress removes only spaces.
On #2 compress keeps only alphnumeric characters
On #3 compress removes special characters defined within the command
Each of them get different values.
When you use select distinct - you get different number of observations
as a result of above. in #3 you haven't define all possible non alphanumeric characters,
therefore you get some in between count of obervations: #1 < #3 < #2
ah right.
forgot about the "distinct"
Thanks Shmuel
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.