Trying to use nodupkey but not sure its the right fit for this situati...

sirspeedyx0 · Posted 05-14-2019 12:53 PM

I am working on a dataset to track quantitative info related to contracts for a company I work for. Each contract can have up to a dozen or so rows of data assigned to it, and we use a unique contract ID to be able to identify an individual contract. However, occasionally we make a mistake when uploading the data, so we also assign sort of a consecutive upload ID so that we know that a contract has been uploaded more than once. Where it gets a little tricky is that the number of rows associated with the contract might change in between uploads. Here is an example:

Contract ID; Upload ID; Row

1, 1, 1

1, 1, 2

1, 1, 3

1, 1, 4

2, 1, 1

3, 1, 1

3, 1, 2

So here we have completed an upload of 3 contracts. Later on characteristics of the contracts have changed, so we have to do another upload, and then maybe we need yet another revision of one particular contract so we do a 3rd upload. Now the dataset looks like this:

Contract ID; Upload ID; Row

1, 1, 1

1, 1, 2

1, 1, 3

1, 1, 4

2, 1, 1

3, 1, 1

3, 1, 2

1, 2, 1

1, 2, 2

1, 2, 3

1, 2, 4

1, 2, 5

2, 2, 1

3, 2, 1

1, 3, 1

1, 3, 3

1, 3, 5

So the question is, is there a function that will pull me all rows related to a specific contract that are from the latest upload? So ultimately I'd want to pull this:

Contract ID; Upload ID; Row

2, 2, 1

3, 2, 1

1, 3, 1

1, 3, 3

1, 3, 5

Many thanks in advance for your help.

Reeza · Posted 05-14-2019 12:57 PM

Usually you'll have a date in that type of data as well indicating the load date or something along those lines. Do you have a variable that you're using for that? If so, this is trivial. If not, this is still doable, just more work.

sirspeedyx0 · Posted 05-14-2019 01:03 PM

Unfortunately we don't have a reliable upload timestamp. We hoped that the Upload ID would replace that, as it counts up each time we upload a new batch of contracts.

Reeza · Posted 05-14-2019 01:22 PM

Unfortunately this doesn't lend itself well to FIRST/LAST though it can probably be done.

Here's a SQL approach that works, I kept it simple but you can make it more efficient/less steps if needed.

data have; 
infile cards dlm=',' truncover dsd;
input ContractID UpLoadID Row;
cards;
1, 1, 1
1, 1, 2
1, 1, 3
1, 1, 4
2, 1, 1
3, 1, 1
3, 1, 2
1, 2, 1
1, 2, 2
1, 2, 3
1, 2, 4
1, 2, 5
2, 2, 1
3, 2, 1
1, 3, 1
1, 3, 3
1, 3, 5
;

proc sql;
create table temp as
select contractID, max(uploadID) as latest_upload
from have
group by contractID;
quit;

proc sql;
create table want as
select t2.*
from temp as t1
inner join have as t2
on t1.contractID=t2.contractID and t1.latest_upload = t2.uploadID
order by 1,2,3;
quit;

@sirspeedyx0 wrote:

Unfortunately we don't have a reliable upload timestamp. We hoped that the Upload ID would replace that, as it counts up each time we upload a new batch of contracts.

sirspeedyx0 · Posted 05-14-2019 02:57 PM

I will be attempting to implement tomorrow, but I wanted to say thanks for this.

Regards,

speedy

FreelanceReinh · Posted 05-14-2019 04:29 PM

Hi @sirspeedyx0 (and btw welcome to the SAS Support Communities! :-)),

@Reeza wrote:

Here's a SQL approach that works, I kept it simple but you can make it more efficient/less steps if needed.

The shorter version Reeza mentioned could look like this:

proc sql;
create table want as
select * from have
group by contractID
having uploadID=max(uploadID)
order by 2,1,3; /* to match the example output */
quit;

Trying to use nodupkey but not sure its the right fit for this situation. Please help!

Re: Trying to use nodupkey but not sure its the right fit for this situation. Please help!

Re: Trying to use nodupkey but not sure its the right fit for this situation. Please help!

Re: Trying to use nodupkey but not sure its the right fit for this situation. Please help!

Re: Trying to use nodupkey but not sure its the right fit for this situation. Please help!

Re: Trying to use nodupkey but not sure its the right fit for this situation. Please help!

Trying to use nodupkey but not sure its the right fit for this situation. Please help!

Re: Trying to use nodupkey but not sure its the right fit for this situation. Please help!

Re: Trying to use nodupkey but not sure its the right fit for this situation. Please help!

Re: Trying to use nodupkey but not sure its the right fit for this situation. Please help!

Re: Trying to use nodupkey but not sure its the right fit for this situation. Please help!

Re: Trying to use nodupkey but not sure its the right fit for this situation. Please help!

Registration is open

SAS Training: Just a Click Away