DATA Step, Macro, Functions and more

Data Cleaning - Character String

Reply
Occasional Contributor
Posts: 16

Data Cleaning - Character String

Hi all, 

 

I have issues on my data.

What I am trying to do is data cleaning and the data I received is a raw file. 

 

The example for the data I am working is as below:

DATA TEMP;
INPUT @1 Ref 7. 
	  @9 ID 5. 
	  @14 LockerNo 12.
	  @28 Date ddmmyy10.
	  @39 Cust_Name $15.
;
format date ddmmyy10.;
DATALINES;
4511445 1952  123456789123 08/12/2001 XYZ PRODUCTION 
4545154 1952  987654321987 26/09/2001 DEF Co.        
7895412 96321 456123789456 12/10/2000 PINK Enterprise
7895412 96321 123789654753 21/12/2000 GbC AGENCY     
5451654 96321 125874934589 05/10/2006 ABC AGENCY     
5451654 96321 123467984352 23/11/2004 ABC AGENCu     
5451654 96321 785645464644 17/01/2005 ABC AGENCY     
7895412 96321 123789654753 21/12/2000 ABC AGENCY     
;
RUN; 

What I am trying to achieve is:

# Note that Everything will be grouped by ID. 

1) If the first 5 characters of Cust_Name or the last 5 characters of Cust_Name is the same, then a new variable, Final will have the value of Ref with latest Date

 

Below is the table I am trying to achieve:

RefIDDateCust_NameFinal
451144519528/12/2001XYZ PRODUCTION4511445
4545154195226/9/2001DEF Co.4545154
78954129632112/10/2000PINK Enterprise7895412
78954129632121/12/2000GbC AGENCY5451654
5451654963215/10/2006ABC AGENCY5451654
54516549632123/11/2004ABC AGENCu5451654
54516549632117/1/2005ABC AGENCY5451654
54516549632121/12/2000ABC AGENCY5451654

 

Cust_Name is a character string and might have many patterns, so after looking at the data, some might have either for the first 5 characters or the last 5 characters. 

 

I REALLY APPRECIATE YOUR HELP ON THIS!

 

Btw, I am using SAS Enterprise Guide 7.1

Respected Advisor
Posts: 4,935

Re: Data Cleaning - Character String

Posted in reply to AieuYuhara

What is Final for this data?

 

ref ID Cust_Name Final
1 1 12345ABCDE ?
2 1 12345ABCDF ?
3 1 12346ABCDF ?

Cust_name from obs 1 and 2 share the first 5 chars and obs 2 and 3 share the last five chars.

PG
Occasional Contributor
Posts: 16

Re: Data Cleaning - Character String

Hi PG,

 

Final will have the value which parent it should refer to. (The parent should refer to the Ref which have the latest date)

 

Cust_name (last column) from observation 1 and 2 does not share the first five characters. as one is XYZ Production and PINK DEF Co. Since they do not share or have same 5 character for the first or last, it have their own ref as parent.

 

This is the code to get the variable Final. But the problem is they read cust_name seperately even though they just differs 2/3 characters.

 

proc sort data=temp;
by id cust_name descending date;
run;

data want;
set temp;
by id cust_name;
retain final;
if first.cust_name then final = ref;
run;

proc sort data=want;
by id cust_name date;
run;

proc print data=want noobs;
run;

after you run the above code, what i want is observation 7 to copy 5451654 for Final. 
why? because they are in the same ID, only some of the characters differs.

 

Now I am looking for some codes which might help to group the data even though some of the customer_name differs one/two characters.

 

Appreciate your help!

Ask a Question
Discussion stats
  • 2 replies
  • 119 views
  • 0 likes
  • 2 in conversation