BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

hi,

lastname and  firstname have missing values also include a lot of characters such as: ? [ ] | : ( ) '    . -  and i need to compress  them

i know #1 wont do what i need  but i did run it out of curousity and got:

#1 Table Y_TMI.TMI_KILL created, with 500475 rows and 7 columns.

 

#2 and #3   would do what i need but i am getting  different  results:


#2 Table Y_TMI.TMI_KILL created, with 500058 rows and 7 columns.
#3 Table Y_TMI.TMI_KILL created, with 500100 rows and 7 columns.

 

1.What is causing the  different  results in the 3 runs?

2.If i am to chose between #2 and #3 which one seems  to be more accurate ?

 

anyone  please? thx


proc sql;
create table y_tmi.tmi_kill as
select distinct
 compress(dg.EmailAdd) as EmailAdd,
 
     /*#1*/  /*cats(upcase(strip(compress(dg.FirstName))), upcase(strip(compress(dg.LastName))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,*/
      /*#2*/ /*cats(upcase(strip(compress(dg.FirstName,,"ka"))), upcase(strip(compress(dg.LastName,,"ka"))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,*/
      /*#3*/  /*cats(upcase(strip(compress(dg.FirstName,"?[]|:()'.- "))), upcase(strip(compress(dg.LastName,"?[]|:()'.- "))), dg.dob,strip(POSTALCODE)) as NameDOBpostal,
 from y_tmi.tmi_data_growth dg
 left join y_cus.ALLIANZ a

 on a.CUSTNUM = dg.CustNum
 left join (select matchkey, contactid from y_cus.gi union select matchkey, contactid from y_cus.gi_driver) gi

 on a.matchkey = gi.matchkey
 left join y_gi.gi_contactid_xref xref

 on gi.contactid = xref.cd_contact_id;

quit;
run;

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Shmuel
Garnet | Level 18

On #1 compress removes only spaces.

On #2 compress keeps only alphnumeric characters

On #3 compress removes special characters defined within the command

 

Each of them get different values.

When you use select distinct - you get different number of observations 

as a result of above. in #3 you haven't define all possible non alphanumeric characters,

therefore you get some in between count of obervations:  #1 < #3 < #2

View solution in original post

2 REPLIES 2
Shmuel
Garnet | Level 18

On #1 compress removes only spaces.

On #2 compress keeps only alphnumeric characters

On #3 compress removes special characters defined within the command

 

Each of them get different values.

When you use select distinct - you get different number of observations 

as a result of above. in #3 you haven't define all possible non alphanumeric characters,

therefore you get some in between count of obervations:  #1 < #3 < #2

Tal
Pyrite | Level 9 Tal
Pyrite | Level 9

ah right.

forgot about the "distinct"

 

Thanks Shmuel

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 887 views
  • 0 likes
  • 2 in conversation