I am working on a dataset to track quantitative info related to contracts for a company I work for. Each contract can have up to a dozen or so rows of data assigned to it, and we use a unique contract ID to be able to identify an individual contract. However, occasionally we make a mistake when uploading the data, so we also assign sort of a consecutive upload ID so that we know that a contract has been uploaded more than once. Where it gets a little tricky is that the number of rows associated with the contract might change in between uploads. Here is an example:
Contract ID; Upload ID; Row
1, 1, 1
1, 1, 2
1, 1, 3
1, 1, 4
2, 1, 1
3, 1, 1
3, 1, 2
So here we have completed an upload of 3 contracts. Later on characteristics of the contracts have changed, so we have to do another upload, and then maybe we need yet another revision of one particular contract so we do a 3rd upload. Now the dataset looks like this:
Contract ID; Upload ID; Row
1, 1, 1
1, 1, 2
1, 1, 3
1, 1, 4
2, 1, 1
3, 1, 1
3, 1, 2
1, 2, 1
1, 2, 2
1, 2, 3
1, 2, 4
1, 2, 5
2, 2, 1
3, 2, 1
1, 3, 1
1, 3, 3
1, 3, 5
So the question is, is there a function that will pull me all rows related to a specific contract that are from the latest upload? So ultimately I'd want to pull this:
Contract ID; Upload ID; Row
2, 2, 1
3, 2, 1
1, 3, 1
1, 3, 3
1, 3, 5
Many thanks in advance for your help.
Unfortunately we don't have a reliable upload timestamp. We hoped that the Upload ID would replace that, as it counts up each time we upload a new batch of contracts.
Unfortunately this doesn't lend itself well to FIRST/LAST though it can probably be done.
Here's a SQL approach that works, I kept it simple but you can make it more efficient/less steps if needed.
data have;
infile cards dlm=',' truncover dsd;
input ContractID UpLoadID Row;
cards;
1, 1, 1
1, 1, 2
1, 1, 3
1, 1, 4
2, 1, 1
3, 1, 1
3, 1, 2
1, 2, 1
1, 2, 2
1, 2, 3
1, 2, 4
1, 2, 5
2, 2, 1
3, 2, 1
1, 3, 1
1, 3, 3
1, 3, 5
;
proc sql;
create table temp as
select contractID, max(uploadID) as latest_upload
from have
group by contractID;
quit;
proc sql;
create table want as
select t2.*
from temp as t1
inner join have as t2
on t1.contractID=t2.contractID and t1.latest_upload = t2.uploadID
order by 1,2,3;
quit;
@sirspeedyx0 wrote:
Unfortunately we don't have a reliable upload timestamp. We hoped that the Upload ID would replace that, as it counts up each time we upload a new batch of contracts.
I will be attempting to implement tomorrow, but I wanted to say thanks for this.
Regards,
speedy
Hi @sirspeedyx0 (and btw welcome to the SAS Support Communities! :-)),
@Reeza wrote:
Here's a SQL approach that works, I kept it simple but you can make it more efficient/less steps if needed.
The shorter version Reeza mentioned could look like this:
proc sql;
create table want as
select * from have
group by contractID
having uploadID=max(uploadID)
order by 2,1,3; /* to match the example output */
quit;
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.