Hi all,
I have issues on my data.
What I am trying to do is data cleaning and the data I received is a raw file.
The example for the data I am working is as below:
DATA TEMP;
INPUT @1 Ref 7.
@9 ID 5.
@14 LockerNo 12.
@28 Date ddmmyy10.
@39 Cust_Name $15.
;
format date ddmmyy10.;
DATALINES;
4511445 1952 123456789123 08/12/2001 XYZ PRODUCTION
4545154 1952 987654321987 26/09/2001 DEF Co.
7895412 96321 456123789456 12/10/2000 PINK Enterprise
7895412 96321 123789654753 21/12/2000 GbC AGENCY
5451654 96321 125874934589 05/10/2006 ABC AGENCY
5451654 96321 123467984352 23/11/2004 ABC AGENCu
5451654 96321 785645464644 17/01/2005 ABC AGENCY
7895412 96321 123789654753 21/12/2000 ABC AGENCY
;
RUN;
What I am trying to achieve is:
# Note that Everything will be grouped by ID.
1) If the first 5 characters of Cust_Name or the last 5 characters of Cust_Name is the same, then a new variable, Final will have the value of Ref with latest Date
Below is the table I am trying to achieve:
Ref | ID | Date | Cust_Name | Final |
4511445 | 1952 | 8/12/2001 | XYZ PRODUCTION | 4511445 |
4545154 | 1952 | 26/9/2001 | DEF Co. | 4545154 |
7895412 | 96321 | 12/10/2000 | PINK Enterprise | 7895412 |
7895412 | 96321 | 21/12/2000 | GbC AGENCY | 5451654 |
5451654 | 96321 | 5/10/2006 | ABC AGENCY | 5451654 |
5451654 | 96321 | 23/11/2004 | ABC AGENCu | 5451654 |
5451654 | 96321 | 17/1/2005 | ABC AGENCY | 5451654 |
5451654 | 96321 | 21/12/2000 | ABC AGENCY | 5451654 |
Cust_Name is a character string and might have many patterns, so after looking at the data, some might have either for the first 5 characters or the last 5 characters.
I REALLY APPRECIATE YOUR HELP ON THIS!
Btw, I am using SAS Enterprise Guide 7.1
What is Final for this data?
ref ID Cust_Name Final 1 1 12345ABCDE ? 2 1 12345ABCDF ? 3 1 12346ABCDF ?
Cust_name from obs 1 and 2 share the first 5 chars and obs 2 and 3 share the last five chars.
Hi PG,
Final will have the value which parent it should refer to. (The parent should refer to the Ref which have the latest date)
Cust_name (last column) from observation 1 and 2 does not share the first five characters. as one is XYZ Production and PINK DEF Co. Since they do not share or have same 5 character for the first or last, it have their own ref as parent.
This is the code to get the variable Final. But the problem is they read cust_name seperately even though they just differs 2/3 characters.
proc sort data=temp;
by id cust_name descending date;
run;
data want;
set temp;
by id cust_name;
retain final;
if first.cust_name then final = ref;
run;
proc sort data=want;
by id cust_name date;
run;
proc print data=want noobs;
run;
after you run the above code, what i want is observation 7 to copy 5451654 for Final.
why? because they are in the same ID, only some of the characters differs.
Now I am looking for some codes which might help to group the data even though some of the customer_name differs one/two characters.
Appreciate your help!
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.