BookmarkSubscribeRSS Feed
AieuYuhara
Fluorite | Level 6

Hi all, 

 

I have issues on my data.

What I am trying to do is data cleaning and the data I received is a raw file. 

 

The example for the data I am working is as below:

DATA TEMP;
INPUT @1 Ref 7. 
	  @9 ID 5. 
	  @14 LockerNo 12.
	  @28 Date ddmmyy10.
	  @39 Cust_Name $15.
;
format date ddmmyy10.;
DATALINES;
4511445 1952  123456789123 08/12/2001 XYZ PRODUCTION 
4545154 1952  987654321987 26/09/2001 DEF Co.        
7895412 96321 456123789456 12/10/2000 PINK Enterprise
7895412 96321 123789654753 21/12/2000 GbC AGENCY     
5451654 96321 125874934589 05/10/2006 ABC AGENCY     
5451654 96321 123467984352 23/11/2004 ABC AGENCu     
5451654 96321 785645464644 17/01/2005 ABC AGENCY     
7895412 96321 123789654753 21/12/2000 ABC AGENCY     
;
RUN; 

What I am trying to achieve is:

# Note that Everything will be grouped by ID. 

1) If the first 5 characters of Cust_Name or the last 5 characters of Cust_Name is the same, then a new variable, Final will have the value of Ref with latest Date

 

Below is the table I am trying to achieve:

RefIDDateCust_NameFinal
451144519528/12/2001XYZ PRODUCTION4511445
4545154195226/9/2001DEF Co.4545154
78954129632112/10/2000PINK Enterprise7895412
78954129632121/12/2000GbC AGENCY5451654
5451654963215/10/2006ABC AGENCY5451654
54516549632123/11/2004ABC AGENCu5451654
54516549632117/1/2005ABC AGENCY5451654
54516549632121/12/2000ABC AGENCY5451654

 

Cust_Name is a character string and might have many patterns, so after looking at the data, some might have either for the first 5 characters or the last 5 characters. 

 

I REALLY APPRECIATE YOUR HELP ON THIS!

 

Btw, I am using SAS Enterprise Guide 7.1

2 REPLIES 2
PGStats
Opal | Level 21

What is Final for this data?

 

ref ID Cust_Name Final
1 1 12345ABCDE ?
2 1 12345ABCDF ?
3 1 12346ABCDF ?

Cust_name from obs 1 and 2 share the first 5 chars and obs 2 and 3 share the last five chars.

PG
AieuYuhara
Fluorite | Level 6

Hi PG,

 

Final will have the value which parent it should refer to. (The parent should refer to the Ref which have the latest date)

 

Cust_name (last column) from observation 1 and 2 does not share the first five characters. as one is XYZ Production and PINK DEF Co. Since they do not share or have same 5 character for the first or last, it have their own ref as parent.

 

This is the code to get the variable Final. But the problem is they read cust_name seperately even though they just differs 2/3 characters.

 

proc sort data=temp;
by id cust_name descending date;
run;

data want;
set temp;
by id cust_name;
retain final;
if first.cust_name then final = ref;
run;

proc sort data=want;
by id cust_name date;
run;

proc print data=want noobs;
run;

after you run the above code, what i want is observation 7 to copy 5451654 for Final. 
why? because they are in the same ID, only some of the characters differs.

 

Now I am looking for some codes which might help to group the data even though some of the customer_name differs one/two characters.

 

Appreciate your help!

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 2 replies
  • 950 views
  • 0 likes
  • 2 in conversation