Hello all,
I am doing data cleaning on a customer data. My goal is to select the 'survivor ' of the customer data based on data source, reliability and recency. To give an example I need to assign one survivor Identitynum to each Newid based on the information that Data source '0' is more reliable than '1', similarly Suspicious '0' is more reliable than '1', and the biggest ID is the most reliable than the others because it is the most recent. I need to write a code which will transform the table1 to the table2.
ID | Newid | Identitynum | Data Source | Suspicious |
1 | 1 | 13849 | 0 | 0 |
1392457 | 1 | 13844 | 1 | 0 |
1250118 | 9 | 14572 | 1 | 1 |
9 | 9 | 17532 | 1 | 0 |
37927 | 9 | 17532 | 1 | 0 |
1706203 | 10 | 29037 | 1 | 1 |
1733 | 10 | 37452 | 1 | 1 |
34609 | 10 | 44445 | 1 | 1 |
Table1
ID | newid | Identitynum | Data Source | Suspicious | Survivor |
1 | 1 | 13849 | 0 | 0 | 1 |
37927 | 9 | 17532 | 1 | 0 | 1 |
1706203 | 10 | 29037 | 1 | 1 | 1 |
Table2
Thank you!
PS:Assigning the survivor based on 'suspicious' and recency when max(ID) is not equal to the latest non-suspicious(suspicious=0) ID in the same newid group is the main problem.
@Asli_A wrote:
Hello all,
I am doing data cleaning on a customer data. My goal is to select the 'survivor ' of the customer data based on data source, reliability and recency. To give an example I need to assign one survivor Identitynum to each Newid based on the information that Data source '0' is more reliable than '1', similarly Suspicious '0' is more reliable than '1', and the biggest ID is the most reliable than the others because it is the most recent. I need to write a code which will transform the table1 to the table2.
ID Newid Identitynum Data Source Suspicious 1 1 13849 0 0 1392457 1 13844 1 0 1250118 9 14572 1 1 9 9 17532 1 0 37927 9 17532 1 0 1706203 10 29037 1 1 1733 10 37452 1 1 34609 10 44445 1 1 Table1
ID newid Identitynum Data Source Suspicious Survivor 1 1 13849 0 0 1 37927 9 17532 1 0 1 1706203 10 29037 1 1 1 Table2
Thank you!
PS:Assigning the survivor based on 'suspicious' and recency when max(ID) is not equal to the latest non-suspicious(suspicious=0) ID in the same newid group is the main problem.
If I am understanding the question is basically: get the records into a desired order and select the correct one.
This might give you a start:
proc sort data=have; by newid datasource suspicious descending id; run; data want; set have; by newid; if first.newid; run;
@Asli_A wrote:
Hello all,
I am doing data cleaning on a customer data. My goal is to select the 'survivor ' of the customer data based on data source, reliability and recency. To give an example I need to assign one survivor Identitynum to each Newid based on the information that Data source '0' is more reliable than '1', similarly Suspicious '0' is more reliable than '1', and the biggest ID is the most reliable than the others because it is the most recent. I need to write a code which will transform the table1 to the table2.
ID Newid Identitynum Data Source Suspicious 1 1 13849 0 0 1392457 1 13844 1 0 1250118 9 14572 1 1 9 9 17532 1 0 37927 9 17532 1 0 1706203 10 29037 1 1 1733 10 37452 1 1 34609 10 44445 1 1 Table1
ID newid Identitynum Data Source Suspicious Survivor 1 1 13849 0 0 1 37927 9 17532 1 0 1 1706203 10 29037 1 1 1 Table2
Thank you!
PS:Assigning the survivor based on 'suspicious' and recency when max(ID) is not equal to the latest non-suspicious(suspicious=0) ID in the same newid group is the main problem.
If I am understanding the question is basically: get the records into a desired order and select the correct one.
This might give you a start:
proc sort data=have; by newid datasource suspicious descending id; run; data want; set have; by newid; if first.newid; run;
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.