DATA Step, Macro, Functions and more

Extraction of data

Reply
Contributor
Posts: 66

Extraction of data

Hi 

I Am looking for the highlighted data to be extracted for the following data 

 

Identifier_LHS MAX_of_Date_Loaded_LHS Status_LHS Work_Status_LHS MAX_of_MAX_of_Version
153553 24MAR18:18:31:58 Clear Unworked 1
153553 24MAR18:18:32:16 Clear Unworked 2
153553 24MAR18:18:32:17 Clear Unworked 3
153553 24MAR18:18:32:20 Clear Worked 4
14297488 27FEB18:16:56:53 Suspect Worked 2
14297488 28MAR18:16:39:42 Suspect Unworked 3

 

Tried using the max of date and max of version ...but not getting the desired o/p

PROC Star
Posts: 1,328

Re: Extraction of data

sounds like basic sql exercise 

data have;
input Identifier_LHS MAX_of_Date_Loaded_LHS :datetime20. Status_LHS $ Work_Status_LHS :$15. MAX_of_MAX_of_Version;
format MAX_of_Date_Loaded_LHS datetime20.;
datalines;
153553 24MAR18:18:31:58 Clear Unworked 1
153553 24MAR18:18:32:16 Clear Unworked 2
153553 24MAR18:18:32:17 Clear Unworked 3
153553 24MAR18:18:32:20 Clear Worked 4
14297488 27FEB18:16:56:53 Suspect Worked 2
14297488 28MAR18:16:39:42 Suspect Unworked 3
;

proc sql;
create table want as
select *
from have
where Work_Status_LHS='Unworked'
group by Identifier_LHS
having MAX_of_Date_Loaded_LHS=max(MAX_of_Date_Loaded_LHS);
quit;
Contributor
Posts: 66

Re: Extraction of data

Posted in reply to novinosrin
no the status can vary ...to be unworked,clear,worked,etc.
Super User
Posts: 13,046

Re: Extraction of data

What is the rule that tells us the data contains the correct record?

Your statement of max/min not working indicates the datatime alone is insufficient (not that the highlighted first record was either a max or min for the identifier). Also "no the status can vary ...to be unworked,clear,worked,etc." does not tell what the selection rule might be.

 

Would it by any chance be that you want  MAX_of_MAX_of_Version=3 only?

Contributor
Posts: 66

Re: Extraction of data

Identifier_LHS Date_Loaded_LHS Status_LHS Work_Status_LHS MAX_of_Version Calculation
153553 24MAR18:18:32:16 Clear Unworked 4 1
153553 24MAR18:18:32:20 Clear Worked 4 1
153553 24MAR18:18:32:17 Clear Unworked 4 1
153553 24MAR18:18:31:58 Clear Unworked 4 1
153553 24MAR18:18:32:20 Clear Worked 4 1
153553 24MAR18:18:32:17 Clear Unworked 4 1
153553 24MAR18:18:32:16 Clear Unworked 4 1
153553 24MAR18:18:31:58 Clear Unworked 4 1
14297488 28MAR18:16:39:42 Suspect Unworked 4 0
14297488 27FEB18:16:56:53 Suspect Worked 4 1
14297488 28MAR18:16:39:42 Suspect Unworked 4 0
14297488 28MAR18:16:39:42 Suspect Unworked 4 0
14297488 27FEB18:16:56:53 Suspect Worked 4 1
14297488 27FEB18:16:56:53 Suspect Worked 4 1
14297488 27FEB18:16:56:53 Suspect Worked 4 1
14297488 28MAR18:16:39:42 Suspect Unworked 4 0
14297488 28MAR18:16:39:42 Suspect Unworked 4 0
14297488 28MAR18:16:39:42 Suspect Unworked 4 0
14297488 27FEB18:16:56:53 Suspect Worked 4 1
14297488 27FEB18:16:56:53 Suspect Worked 4 1
14297488 28MAR18:16:39:42 Suspect Unworked 4 0

 

want to fetch the latest record with latest date and version is greater max.

 

but there are cases like record highlighted in blue where the date and time are the same. 

so after using the max function for date and version it dose not work. 

 

PROC Star
Posts: 1,328

Re: Extraction of data

why not just apply a simple distinct?

 

data have;
input Identifier_LHS Date_Loaded_LHS :datetime20. Status_LHS $ Work_Status_LHS :$15. MAX_of_Version Calculation;
format Date_Loaded_LHS datetime20.;
datalines;
153553 24MAR18:18:32:16 Clear Unworked 4 1
153553 24MAR18:18:32:20 Clear Worked 4 1
153553 24MAR18:18:32:17 Clear Unworked 4 1
153553 24MAR18:18:31:58 Clear Unworked 4 1
153553 24MAR18:18:32:20 Clear Worked 4 1
153553 24MAR18:18:32:17 Clear Unworked 4 1
153553 24MAR18:18:32:16 Clear Unworked 4 1
153553 24MAR18:18:31:58 Clear Unworked 4 1
14297488 28MAR18:16:39:42 Suspect Unworked 4 0
14297488 27FEB18:16:56:53 Suspect Worked 4 1
14297488 28MAR18:16:39:42 Suspect Unworked 4 0
14297488 28MAR18:16:39:42 Suspect Unworked 4 0
14297488 27FEB18:16:56:53 Suspect Worked 4 1
14297488 27FEB18:16:56:53 Suspect Worked 4 1
14297488 27FEB18:16:56:53 Suspect Worked 4 1
14297488 28MAR18:16:39:42 Suspect Unworked 4 0
14297488 28MAR18:16:39:42 Suspect Unworked 4 0
14297488 28MAR18:16:39:42 Suspect Unworked 4 0
14297488 27FEB18:16:56:53 Suspect Worked 4 1
14297488 27FEB18:16:56:53 Suspect Worked 4 1
14297488 28MAR18:16:39:42 Suspect Unworked 4 0
;

proc sql;
create table want as
select distinct Identifier_LHS ,Date_Loaded_LHS ,Status_LHS, Work_Status_LHS ,MAX_of_Version ,Calculation
from have
group by Identifier_LHS
having Date_Loaded_LHS=max(Date_Loaded_LHS);
quit; 
Contributor
Posts: 66

Re: Extraction of data

Posted in reply to novinosrin
yes even if i apply distinct then i am getting the following output:
Identifier_LHS MAX_of_Date_Loaded_LHS Status_LHS Work_Status_LHS MAX_of_MAX_of_Version
153553 24MAR18:18:31:58 Clear Unworked 1
153553 24MAR18:18:32:16 Clear Unworked 2
153553 24MAR18:18:32:17 Clear Unworked 3
153553 24MAR18:18:32:20 Clear Worked 4
14297488 27FEB18:16:56:53 Suspect Worked 2
14297488 28MAR18:16:39:42 Suspect Unworked 3

PROC Star
Posts: 1,328

Re: Extraction of data

I tested my code with the sample you gave and it seems to work fine

here is the log:

 


113 proc sql;
114 create table want as
115 select distinct Identifier_LHS ,Date_Loaded_LHS ,Status_LHS,
115! Work_Status_LHS ,MAX_of_Version ,Calculation
116 from have
117 group by Identifier_LHS
118 having Date_Loaded_LHS=max(Date_Loaded_LHS);
NOTE: The query requires remerging summary statistics back with the
original data.
NOTE: Table WORK.WANT created, with 2 rows and 6 columns.

119 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

Trusted Advisor
Posts: 1,288

Re: Extraction of data

You apparently want one record per id at latest datetime.  Then what if you have this data, with tied records at maximum datetime 24MAR18:18:32:20?

 

153553 24MAR18:18:32:20 Clear Unworked 4 1
153553 24MAR18:18:32:20 Clear Worked   4 1
153553 24MAR18:18:32:20 Clear Unworked 5 1

 

These records have distinct values in other fields.  Then what is your rule for choosing a record?  The proc sql "select distinct" approach would yield all three.  Is that what you want?

Contributor
Posts: 66

Re: Extraction of data

thats what the querry is so we taking max date a max version would help . i guess
Ask a Question
Discussion stats
  • 9 replies
  • 105 views
  • 0 likes
  • 4 in conversation