BookmarkSubscribeRSS Feed
AieuYuhara
Fluorite | Level 6

Hi all, 

 

I have issues on my data.

What I am trying to do is data cleaning and the data I received is a raw file. 

 

The example for the data I am working is as below:

DATA TEMP;
INPUT @1 Ref 7. 
	  @9 ID 5. 
	  @14 LockerNo 12.
	  @28 Date ddmmyy10.
	  @39 Cust_Name $15.
;
format date ddmmyy10.;
DATALINES;
4511445 1952  123456789123 08/12/2001 XYZ PRODUCTION 
4545154 1952  987654321987 26/09/2001 DEF Co.        
7895412 96321 456123789456 12/10/2000 PINK Enterprise
7895412 96321 123789654753 21/12/2000 GbC AGENCY     
5451654 96321 125874934589 05/10/2006 ABC AGENCY     
5451654 96321 123467984352 23/11/2004 ABC AGENCu     
5451654 96321 785645464644 17/01/2005 ABC AGENCY     
7895412 96321 123789654753 21/12/2000 ABC AGENCY     
;
RUN; 

What I am trying to achieve is:

# Note that Everything will be grouped by ID. 

1) If the first 5 characters of Cust_Name or the last 5 characters of Cust_Name is the same, then a new variable, Final will have the value of Ref with latest Date

 

Below is the table I am trying to achieve:

RefIDDateCust_NameFinal
451144519528/12/2001XYZ PRODUCTION4511445
4545154195226/9/2001DEF Co.4545154
78954129632112/10/2000PINK Enterprise7895412
78954129632121/12/2000GbC AGENCY5451654
5451654963215/10/2006ABC AGENCY5451654
54516549632123/11/2004ABC AGENCu5451654
54516549632117/1/2005ABC AGENCY5451654
54516549632121/12/2000ABC AGENCY5451654

 

Cust_Name is a character string and might have many patterns, so after looking at the data, some might have either for the first 5 characters or the last 5 characters. 

 

I REALLY APPRECIATE YOUR HELP ON THIS!

 

Btw, I am using SAS Enterprise Guide 7.1

2 REPLIES 2
PGStats
Opal | Level 21

What is Final for this data?

 

ref ID Cust_Name Final
1 1 12345ABCDE ?
2 1 12345ABCDF ?
3 1 12346ABCDF ?

Cust_name from obs 1 and 2 share the first 5 chars and obs 2 and 3 share the last five chars.

PG
AieuYuhara
Fluorite | Level 6

Hi PG,

 

Final will have the value which parent it should refer to. (The parent should refer to the Ref which have the latest date)

 

Cust_name (last column) from observation 1 and 2 does not share the first five characters. as one is XYZ Production and PINK DEF Co. Since they do not share or have same 5 character for the first or last, it have their own ref as parent.

 

This is the code to get the variable Final. But the problem is they read cust_name seperately even though they just differs 2/3 characters.

 

proc sort data=temp;
by id cust_name descending date;
run;

data want;
set temp;
by id cust_name;
retain final;
if first.cust_name then final = ref;
run;

proc sort data=want;
by id cust_name date;
run;

proc print data=want noobs;
run;

after you run the above code, what i want is observation 7 to copy 5451654 for Final. 
why? because they are in the same ID, only some of the characters differs.

 

Now I am looking for some codes which might help to group the data even though some of the customer_name differs one/two characters.

 

Appreciate your help!

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 1430 views
  • 0 likes
  • 2 in conversation