Minutes elapsed correcting code

Accepted Solution Solved
Reply
New Contributor
Posts: 4
Accepted Solution

Minutes elapsed correcting code

The data set I am using has approx 200k rows and 25 columns. There are approx 100 rows for 2000 different games. Each game has a time elapsed variable that goes from 0-90 ( some minutes are damaged, see below, some are all together missing).

I have a minutes elapsed variable in my data set that has been damaged due to extraction and needs to be altered to be used.

The column on the left shows how my data currently sits. The column on the right shows what I need it to be. There are 3 different types of issue that need to be addressed with the code.

1     -     1

1     -     2

3     -     3

4     -     4

1     -     1

2     -     2

2     -     2

3     -     3

1     -     1

3     -     2

3     -     3

4     -     4

Since I have 2000 different games the code needs to loop independently over 2000 unique identifiers that already sit on in the data set.

Any help would be much appreciated.

Let me know if you have questions


Accepted Solutions
Solution
‎11-06-2013 11:53 AM
Super Contributor
Posts: 339

Re: Minutes elapsed correcting code

data have;

     input groupid col1;

     datalines;

1 1

1 1

1 3

1 4

2 1

2 2

2 2

2 3

3 1

3 3

3 3

3 4

;

run;

I assume your data was sequential and extracted retaining the sequence hence why you are doing the adjustments. However, I would state out that if your original data is still available, it would probably be easier for us to help you extract it appropriately than achieve such odd fix.

I assume in your dataset, that the records for a given gameid are not all after one another as in the data example I built so I won't use a set with point= and _N_±1 approach to look ahead and look behind in sequence.

step1 provide a unique id with to preserve ordering of your data

data have2;

     set have;

     specialid=_N_;

run;

step2 use hash tables with ordered option and a hash iterator to lookbehind and lookahead

data have3;

     length prev1 previd next1 nextid specialid 8.;

     if _N_=1 then do;

          declare hash prevhash(dataset: "have2(rename=(col1=prev1 groupid=previd))", ordered: "a");

          declare hiter previter('prevhash');

          prevhash.definekey('previd', 'specialid');

          prevhash.definedata('previd','prev1');

          prevhash.definedone();

          declare hash nexthash(dataset: "have2(rename=(col1=next1 groupid=nextid))", ordered: "a");

          declare hiter nextiter('nexthash');

          nexthash.definekey('nextid', 'specialid');

          nexthash.definedata('nextid','next1');

          nexthash.definedone();

     end;

     set have2;

 

     rcprev=previter.setcur(key: groupid, key:specialid);

     rcprev=previter.prev();

     rcnext=nextiter.setcur(key: groupid, key:specialid);

     rcnext=nextiter.next();

     output;

run;

I broke it down in an additionnal step in case I misunderstand how you ought to handle the logic of fixing. There is also a potential backdraw of the above if you have a case like (1 3 3 5) in the left column as since the 3rd value's previous data is done before fixing it from 3 to 2, the end result would be (1 2 4 5) instead of (1 2 3 5). Anyway, I've provided you with a mean to acquire previous and next data by groupid (or game) through hash objects. It is possible to use set and point= if you presort your data by groupid.

Last step would be to do adjustments by row using conditional rationale according to previd, nextid, prev1 and next1. These steps could've been inserted before the output statement in the above data step

data want;

     set have3;

     if previd=groupid=nextid and col1=next1 and col1-prev1>1 then colfixed=col1-1;

     else if previd=groupid=nextid and col1=prev1 and next1-col1>1 then colfixed=col1+1;

     else colfixed=col1;

     keep groupid col1 colfixed;

run;

View solution in original post


All Replies
Super User
Posts: 17,724

Re: Minutes elapsed correcting code

I don't follow the logic, how do you know when to increment?

New Contributor
Posts: 4

Re: Minutes elapsed correcting code

Sorry it was difficult enough to explain let alone code.

Scenario 1: If you have a repeated minute on the left hand column i.e.

1

1

3

4

It needs changing to

1

2

3

4

Scenario 2: If you have a repeated minute on the left hand column i.e.

1

3

3

4

It needs changing to

1

2

3

4

Scenario 3: However if you have a repeated minute on the left hand column i.e.

1

2

2

3

It is fine and can be left alone as every minute has been accounted for.

In summary:

- Code only attempts to correct if a minute is repeated

- only when a minute has been missed does a correction take place.

Hope that makes sense.

Solution
‎11-06-2013 11:53 AM
Super Contributor
Posts: 339

Re: Minutes elapsed correcting code

data have;

     input groupid col1;

     datalines;

1 1

1 1

1 3

1 4

2 1

2 2

2 2

2 3

3 1

3 3

3 3

3 4

;

run;

I assume your data was sequential and extracted retaining the sequence hence why you are doing the adjustments. However, I would state out that if your original data is still available, it would probably be easier for us to help you extract it appropriately than achieve such odd fix.

I assume in your dataset, that the records for a given gameid are not all after one another as in the data example I built so I won't use a set with point= and _N_±1 approach to look ahead and look behind in sequence.

step1 provide a unique id with to preserve ordering of your data

data have2;

     set have;

     specialid=_N_;

run;

step2 use hash tables with ordered option and a hash iterator to lookbehind and lookahead

data have3;

     length prev1 previd next1 nextid specialid 8.;

     if _N_=1 then do;

          declare hash prevhash(dataset: "have2(rename=(col1=prev1 groupid=previd))", ordered: "a");

          declare hiter previter('prevhash');

          prevhash.definekey('previd', 'specialid');

          prevhash.definedata('previd','prev1');

          prevhash.definedone();

          declare hash nexthash(dataset: "have2(rename=(col1=next1 groupid=nextid))", ordered: "a");

          declare hiter nextiter('nexthash');

          nexthash.definekey('nextid', 'specialid');

          nexthash.definedata('nextid','next1');

          nexthash.definedone();

     end;

     set have2;

 

     rcprev=previter.setcur(key: groupid, key:specialid);

     rcprev=previter.prev();

     rcnext=nextiter.setcur(key: groupid, key:specialid);

     rcnext=nextiter.next();

     output;

run;

I broke it down in an additionnal step in case I misunderstand how you ought to handle the logic of fixing. There is also a potential backdraw of the above if you have a case like (1 3 3 5) in the left column as since the 3rd value's previous data is done before fixing it from 3 to 2, the end result would be (1 2 4 5) instead of (1 2 3 5). Anyway, I've provided you with a mean to acquire previous and next data by groupid (or game) through hash objects. It is possible to use set and point= if you presort your data by groupid.

Last step would be to do adjustments by row using conditional rationale according to previd, nextid, prev1 and next1. These steps could've been inserted before the output statement in the above data step

data want;

     set have3;

     if previd=groupid=nextid and col1=next1 and col1-prev1>1 then colfixed=col1-1;

     else if previd=groupid=nextid and col1=prev1 and next1-col1>1 then colfixed=col1+1;

     else colfixed=col1;

     keep groupid col1 colfixed;

run;

☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 3 replies
  • 196 views
  • 0 likes
  • 3 in conversation