BookmarkSubscribeRSS Feed
PDevi
Fluorite | Level 6

EDIT:

 

The data isn't always arranged logically. Sometimes, the later date occurs first in the sequence, as seen for id 5.

 

Hello,

 

I have a table that looks as below:

 

id      doc           date_doc       

1       aaa        22-AUG-2013                            

1       bbb        29-AUG-2013

2       aaa        05-APR-2015                                                             

2       bbb        10-MAY-2015                                                              

3       aaa        18-DEC-2017   

3       aaa        28-DEC-2017   

4       aaa        04-FEB-2018

4       bbb        28-FEB-2018

5       bbb        18-SEP-2019

5       bbb        18-JUN-2019

 

Each id each has two docs "aaa" and "bbb", with the latter having a later date always. But there are data entry issues wherein the doc has been recorded incorrectly as seen for id 3 and 5. In case of id 3, the doc for 28-DEC-2017 should have been "bbb" and for id 5, the doc should be "aaa" for 18-JUN-2019.

 

How do I change the doc for all such cases? Again, the date for "aaa" has to be the earlier date and for "bbb" , it should be the later one. 

 

All help is appreciated.             

6 REPLIES 6
ed_sas_member
Meteorite | Level 14

Hi @PDevi ,

 

You can try this code. It drops variable "doc" and compute it from dates.

 

data have;
	input id doc $ date_doc;
	informat date_doc date11.;
	format date_doc date11.;
	cards;
1 aaa 22-AUG-2013                            
1 bbb 29-AUG-2013
2 aaa 05-APR-2015                                                             
2 bbb 10-MAY-2015                                                              
3 aaa 18-DEC-2017   
3 aaa 28-DEC-2017   
4 aaa 04-FEB-2018
4 bbb 28-FEB-2018
5 bbb 18-JUN-2019
5 bbb 18-SEP-2019
;
run;

proc sort data=have out=want (drop=doc);
	by id date_doc;
run;
data want; set want; by id; if first.id then doc="aaa"; else doc="bbb"; run;
PDevi
Fluorite | Level 6
Hi @ed_sas_member , the code changes the doc type to "aaa" for all ids and dates. I only want to change for those that are recorded incorrectly. 🙂
hashman
Ammonite | Level 13

@PDevi:

  1. sort by ID, DATE giving X
  2. create a sorted list of distinct DOC values available from the file giving Y
  3. for each record in every BY group from X, pick DOC from Y sequentially
data have ;                                                 
  input id doc :$3. date :date. ;                           
  format date yymmdd10.;                                    
  cards ;                                                   
1 aaa 22-AUG-2013                                           
1 bbb 29-AUG-2013                                           
2 aaa 05-APR-2015                                           
2 bbb 10-MAY-2015                                           
3 aaa 18-DEC-2017                                           
3 aaa 28-DEC-2017                                           
4 aaa 04-FEB-2018                                           
4 bbb 28-FEB-2018                                           
5 bbb 18-JUN-2019                                           
5 bbb 18-SEP-2019                                           
;                                                           
run ;                                                       
                                                            
proc sql ;                                                  
  create view x as select * from have order id, date ;      
  create view y as select distinct doc from have order doc ;
quit ;                                                      
                                                            
data want ;                                                 
  do p = 1 by 1 until (last.id) ;                           
    set x ;                                                 
    by id ;                                                 
    set y point = p ;                                       
    output ;                                                
  end ;                                                     
run ;                                                       

Caveat: It is assumed that there's at least one distinct DOC for every record in the largest BY group by ID. 

 

Kind regards

Paul D.

 

 

 

ballardw
Super User

@PDevi wrote:

EDIT:

 

The data isn't always arranged logically. Sometimes, the later date occurs first in the sequence, as seen for id 5.

 

Hello,

 

I have a table that looks as below:

 

id      doc           date_doc       

1       aaa        22-AUG-2013                            

1       bbb        29-AUG-2013

2       aaa        05-APR-2015                                                             

2       bbb        10-MAY-2015                                                              

3       aaa        18-DEC-2017   

3       aaa        28-DEC-2017   

4       aaa        04-FEB-2018

4       bbb        28-FEB-2018

5       bbb        18-SEP-2019

5       bbb        18-JUN-2019

 

Each id each has two docs "aaa" and "bbb", with the latter having a later date always. But there are data entry issues wherein the doc has been recorded incorrectly as seen for id 3 and 5. In case of id 3, the doc for 28-DEC-2017 should have been "bbb" and for id 5, the doc should be "aaa" for 18-JUN-2019.

 

How do I change the doc for all such cases? Again, the date for "aaa" has to be the earlier date and for "bbb" , it should be the later one. 

 

All help is appreciated.             


If you are having data entry issues how do you know the issue is not with the date entered and not the Doc value???

PDevi
Fluorite | Level 6
@ballardw Based on the business rules in our organization for this data set.
ballardw
Super User

@PDevi wrote:
@ballardw Based on the business rules in our organization for this data set.

Not attempting to sound like a wise guy but it sounds like you already have issues with someone following "business rules".

 

Now if the data is slugged from an automated process and assigned when the value is entered that might be one thing as the person entering the data (hopefully) doesn't get to mess with the date. But if the date is manually entered I submit that there may still be things to consider about the quality of the dates entered.

 

I deal with manually entered dates and have children getting medical procedures before they born (procedure and birth date entered into separate tables at different times manually) or people completing a program before starting and other date related issues.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 6 replies
  • 646 views
  • 0 likes
  • 4 in conversation