BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

hi,

lastname and  firstname have missing values also include a lot of characters such as: ? [ ] | : ( ) '    . -  and i need to compress  them

i know #1 wont do what i need  but i did run it out of curousity and got:

#1 Table Y_TMI.TMI_KILL created, with 500475 rows and 7 columns.

 

#2 and #3   would do what i need but i am getting  different  results:


#2 Table Y_TMI.TMI_KILL created, with 500058 rows and 7 columns.
#3 Table Y_TMI.TMI_KILL created, with 500100 rows and 7 columns.

 

1.What is causing the  different  results in the 3 runs?

2.If i am to chose between #2 and #3 which one seems  to be more accurate ?

 

anyone  please? thx


proc sql;
create table y_tmi.tmi_kill as
select distinct
 compress(dg.EmailAdd) as EmailAdd,
 
     /*#1*/  /*cats(upcase(strip(compress(dg.FirstName))), upcase(strip(compress(dg.LastName))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,*/
      /*#2*/ /*cats(upcase(strip(compress(dg.FirstName,,"ka"))), upcase(strip(compress(dg.LastName,,"ka"))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,*/
      /*#3*/  /*cats(upcase(strip(compress(dg.FirstName,"?[]|:()'.- "))), upcase(strip(compress(dg.LastName,"?[]|:()'.- "))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,
 from y_tmi.tmi_data_growth dg
 left join y_cus.ALLIANZ a

 on a.CUSTNUM = dg.CustNum
 left join (select matchkey, contactid from y_cus.gi union select matchkey, contactid from y_cus.gi_driver) gi

 on a.matchkey = gi.matchkey
 left join y_gi.gi_contactid_xref xref

 on gi.contactid = xref.cd_contact_id;

quit;
run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

On #1 compress removes only spaces.

On #2 compress keeps only alphnumeric characters

On #3 compress removes special characters defined within the command

 

Each of them get different values.

When you use select distinct - you get different number of observations 

as a result of above. in #3 you haven't define all possible non alphanumeric characters,

therefore you get some in between count of obervations:  #1 < #3 < #2

View solution in original post

2 REPLIES 2
Shmuel
Garnet | Level 18

On #1 compress removes only spaces.

On #2 compress keeps only alphnumeric characters

On #3 compress removes special characters defined within the command

 

Each of them get different values.

When you use select distinct - you get different number of observations 

as a result of above. in #3 you haven't define all possible non alphanumeric characters,

therefore you get some in between count of obervations:  #1 < #3 < #2

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

ah right.

forgot about the "distinct"

 

Thanks Shmuel

sas-innovate-wordmark-2025-midnight.png

Register Today!

Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.


Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1223 views
  • 0 likes
  • 2 in conversation