07-25-2017 09:52 PM
I have issues on my data.
What I am trying to do is data cleaning and the data I received is a raw file.
The example for the data I am working is as below:
DATA TEMP; INPUT @1 Ref 7. @9 ID 5. @14 LockerNo 12. @28 Date ddmmyy10. @39 Cust_Name $15. ; format date ddmmyy10.; DATALINES; 4511445 1952 123456789123 08/12/2001 XYZ PRODUCTION 4545154 1952 987654321987 26/09/2001 DEF Co. 7895412 96321 456123789456 12/10/2000 PINK Enterprise 7895412 96321 123789654753 21/12/2000 GbC AGENCY 5451654 96321 125874934589 05/10/2006 ABC AGENCY 5451654 96321 123467984352 23/11/2004 ABC AGENCu 5451654 96321 785645464644 17/01/2005 ABC AGENCY 7895412 96321 123789654753 21/12/2000 ABC AGENCY ; RUN;
What I am trying to achieve is:
# Note that Everything will be grouped by ID.
1) If the first 5 characters of Cust_Name or the last 5 characters of Cust_Name is the same, then a new variable, Final will have the value of Ref with latest Date
Below is the table I am trying to achieve:
Cust_Name is a character string and might have many patterns, so after looking at the data, some might have either for the first 5 characters or the last 5 characters.
I REALLY APPRECIATE YOUR HELP ON THIS!
Btw, I am using SAS Enterprise Guide 7.1
07-25-2017 10:34 PM
What is Final for this data?
ref ID Cust_Name Final 1 1 12345ABCDE ? 2 1 12345ABCDF ? 3 1 12346ABCDF ?
Cust_name from obs 1 and 2 share the first 5 chars and obs 2 and 3 share the last five chars.
07-25-2017 11:19 PM
Final will have the value which parent it should refer to. (The parent should refer to the Ref which have the latest date)
Cust_name (last column) from observation 1 and 2 does not share the first five characters. as one is XYZ Production and PINK DEF Co. Since they do not share or have same 5 character for the first or last, it have their own ref as parent.
This is the code to get the variable Final. But the problem is they read cust_name seperately even though they just differs 2/3 characters.
proc sort data=temp; by id cust_name descending date; run; data want; set temp; by id cust_name; retain final; if first.cust_name then final = ref; run; proc sort data=want; by id cust_name date; run; proc print data=want noobs; run;
after you run the above code, what i want is observation 7 to copy 5451654 for Final.
why? because they are in the same ID, only some of the characters differs.
Now I am looking for some codes which might help to group the data even though some of the customer_name differs one/two characters.
Appreciate your help!