BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

hi,

lastname and  firstname have missing values also include a lot of characters such as: ? [ ] | : ( ) '    . -  and i need to compress  them

i know #1 wont do what i need  but i did run it out of curousity and got:

#1 Table Y_TMI.TMI_KILL created, with 500475 rows and 7 columns.

 

#2 and #3   would do what i need but i am getting  different  results:


#2 Table Y_TMI.TMI_KILL created, with 500058 rows and 7 columns.
#3 Table Y_TMI.TMI_KILL created, with 500100 rows and 7 columns.

 

1.What is causing the  different  results in the 3 runs?

2.If i am to chose between #2 and #3 which one seems  to be more accurate ?

 

anyone  please? thx


proc sql;
create table y_tmi.tmi_kill as
select distinct
 compress(dg.EmailAdd) as EmailAdd,
 
     /*#1*/  /*cats(upcase(strip(compress(dg.FirstName))), upcase(strip(compress(dg.LastName))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,*/
      /*#2*/ /*cats(upcase(strip(compress(dg.FirstName,,"ka"))), upcase(strip(compress(dg.LastName,,"ka"))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,*/
      /*#3*/  /*cats(upcase(strip(compress(dg.FirstName,"?[]|:()'.- "))), upcase(strip(compress(dg.LastName,"?[]|:()'.- "))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,
 from y_tmi.tmi_data_growth dg
 left join y_cus.ALLIANZ a

 on a.CUSTNUM = dg.CustNum
 left join (select matchkey, contactid from y_cus.gi union select matchkey, contactid from y_cus.gi_driver) gi

 on a.matchkey = gi.matchkey
 left join y_gi.gi_contactid_xref xref

 on gi.contactid = xref.cd_contact_id;

quit;
run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

On #1 compress removes only spaces.

On #2 compress keeps only alphnumeric characters

On #3 compress removes special characters defined within the command

 

Each of them get different values.

When you use select distinct - you get different number of observations 

as a result of above. in #3 you haven't define all possible non alphanumeric characters,

therefore you get some in between count of obervations:  #1 < #3 < #2

View solution in original post

2 REPLIES 2
Shmuel
Garnet | Level 18

On #1 compress removes only spaces.

On #2 compress keeps only alphnumeric characters

On #3 compress removes special characters defined within the command

 

Each of them get different values.

When you use select distinct - you get different number of observations 

as a result of above. in #3 you haven't define all possible non alphanumeric characters,

therefore you get some in between count of obervations:  #1 < #3 < #2

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

ah right.

forgot about the "distinct"

 

Thanks Shmuel

hackathon24-white-horiz.png

The 2025 SAS Hackathon Kicks Off on June 11!

Watch the live Hackathon Kickoff to get all the essential information about the SAS Hackathon—including how to join, how to participate, and expert tips for success.

YouTube LinkedIn

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1431 views
  • 0 likes
  • 2 in conversation