BookmarkSubscribeRSS Feed
sirspeedyx0
Calcite | Level 5

I am working on a dataset to track quantitative info related to contracts for a company I work for.  Each contract can have up to a dozen or so rows of data assigned to it, and we use a unique contract ID to be able to identify an individual contract.  However, occasionally we make a mistake when uploading the data, so we also assign sort of a consecutive upload ID so that we know that a contract has been uploaded more than once.  Where it gets a little tricky is that the number of rows associated with the contract might change in between uploads.  Here is an example:

 

Contract ID; Upload ID; Row

1, 1, 1

1, 1, 2

1, 1, 3

1, 1, 4

2, 1, 1

3, 1, 1

3, 1, 2

 

So here we have completed an upload of 3 contracts.  Later on characteristics of the contracts have changed, so we have to do another upload, and then maybe we need yet another revision of one particular contract so we do a 3rd upload.  Now the dataset looks like this:

 

Contract ID; Upload ID; Row

1, 1, 1

1, 1, 2

1, 1, 3

1, 1, 4

2, 1, 1

3, 1, 1

3, 1, 2

1, 2, 1

1, 2, 2

1, 2, 3

1, 2, 4

1, 2, 5

2, 2, 1

3, 2, 1

1, 3, 1

1, 3, 3

1, 3, 5

 

So the question is, is there a function that will pull me all rows related to a specific contract that are from the latest upload?  So ultimately I'd want to pull this:

 

Contract ID; Upload ID; Row

2, 2, 1

3, 2, 1

1, 3, 1

1, 3, 3

1, 3, 5

 

Many thanks in advance for your help.

 

5 REPLIES 5
Reeza
Super User
Usually you'll have a date in that type of data as well indicating the load date or something along those lines. Do you have a variable that you're using for that? If so, this is trivial. If not, this is still doable, just more work.
sirspeedyx0
Calcite | Level 5

Unfortunately we don't have a reliable upload timestamp.  We hoped that the Upload ID would replace that, as it counts up each time we upload a new batch of contracts.

Reeza
Super User

Unfortunately this doesn't lend itself well to FIRST/LAST though it can probably be done. 

 

Here's a SQL approach that works, I kept it simple but you can make it more efficient/less steps if needed.

 

data have; 
infile cards dlm=',' truncover dsd;
input ContractID UpLoadID Row;
cards;
1, 1, 1
1, 1, 2
1, 1, 3
1, 1, 4
2, 1, 1
3, 1, 1
3, 1, 2
1, 2, 1
1, 2, 2
1, 2, 3
1, 2, 4
1, 2, 5
2, 2, 1
3, 2, 1
1, 3, 1
1, 3, 3
1, 3, 5
;

proc sql;
create table temp as
select contractID, max(uploadID) as latest_upload
from have
group by contractID;
quit;

proc sql;
create table want as
select t2.*
from temp as t1
inner join have as t2
on t1.contractID=t2.contractID and t1.latest_upload = t2.uploadID
order by 1,2,3;
quit;

@sirspeedyx0 wrote:

Unfortunately we don't have a reliable upload timestamp.  We hoped that the Upload ID would replace that, as it counts up each time we upload a new batch of contracts.


 

sirspeedyx0
Calcite | Level 5

I will be attempting to implement tomorrow, but I wanted to say thanks for this.

 

Regards,

 

speedy

 

FreelanceReinh
Jade | Level 19

Hi @sirspeedyx0 (and btw welcome to the SAS Support Communities! :-)),

@Reeza wrote:

Here's a SQL approach that works, I kept it simple but you can make it more efficient/less steps if needed.

The shorter version Reeza mentioned could look like this:

proc sql;
create table want as
select * from have
group by contractID
having uploadID=max(uploadID)
order by 2,1,3; /* to match the example output */
quit;
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 5 replies
  • 1238 views
  • 2 likes
  • 3 in conversation