I am working on a dataset to track quantitative info related to contracts for a company I work for. Each contract can have up to a dozen or so rows of data assigned to it, and we use a unique contract ID to be able to identify an individual contract. However, occasionally we make a mistake when uploading the data, so we also assign sort of a consecutive upload ID so that we know that a contract has been uploaded more than once. Where it gets a little tricky is that the number of rows associated with the contract might change in between uploads. Here is an example:
Contract ID; Upload ID; Row
1, 1, 1
1, 1, 2
1, 1, 3
1, 1, 4
2, 1, 1
3, 1, 1
3, 1, 2
So here we have completed an upload of 3 contracts. Later on characteristics of the contracts have changed, so we have to do another upload, and then maybe we need yet another revision of one particular contract so we do a 3rd upload. Now the dataset looks like this:
Contract ID; Upload ID; Row
1, 1, 1
1, 1, 2
1, 1, 3
1, 1, 4
2, 1, 1
3, 1, 1
3, 1, 2
1, 2, 1
1, 2, 2
1, 2, 3
1, 2, 4
1, 2, 5
2, 2, 1
3, 2, 1
1, 3, 1
1, 3, 3
1, 3, 5
So the question is, is there a function that will pull me all rows related to a specific contract that are from the latest upload? So ultimately I'd want to pull this:
Contract ID; Upload ID; Row
2, 2, 1
3, 2, 1
1, 3, 1
1, 3, 3
1, 3, 5
Many thanks in advance for your help.
Unfortunately we don't have a reliable upload timestamp. We hoped that the Upload ID would replace that, as it counts up each time we upload a new batch of contracts.
Unfortunately this doesn't lend itself well to FIRST/LAST though it can probably be done.
Here's a SQL approach that works, I kept it simple but you can make it more efficient/less steps if needed.
data have;
infile cards dlm=',' truncover dsd;
input ContractID UpLoadID Row;
cards;
1, 1, 1
1, 1, 2
1, 1, 3
1, 1, 4
2, 1, 1
3, 1, 1
3, 1, 2
1, 2, 1
1, 2, 2
1, 2, 3
1, 2, 4
1, 2, 5
2, 2, 1
3, 2, 1
1, 3, 1
1, 3, 3
1, 3, 5
;
proc sql;
create table temp as
select contractID, max(uploadID) as latest_upload
from have
group by contractID;
quit;
proc sql;
create table want as
select t2.*
from temp as t1
inner join have as t2
on t1.contractID=t2.contractID and t1.latest_upload = t2.uploadID
order by 1,2,3;
quit;
@sirspeedyx0 wrote:
Unfortunately we don't have a reliable upload timestamp. We hoped that the Upload ID would replace that, as it counts up each time we upload a new batch of contracts.
I will be attempting to implement tomorrow, but I wanted to say thanks for this.
Regards,
speedy
Hi @sirspeedyx0 (and btw welcome to the SAS Support Communities! :-)),
@Reeza wrote:
Here's a SQL approach that works, I kept it simple but you can make it more efficient/less steps if needed.
The shorter version Reeza mentioned could look like this:
proc sql;
create table want as
select * from have
group by contractID
having uploadID=max(uploadID)
order by 2,1,3; /* to match the example output */
quit;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.