BookmarkSubscribeRSS Feed
brophymj
Quartz | Level 8

I have a table of the form:

Field1Field2Field3Sequence
.........4
.........13
.........27

I would like to adjust the table so that I fill all the sequence gaps with identical information to the last sequence so for the gap between 4 and 13 I would create 8 rows with identical field entries to those in sequence 4 (not 13). Similarly for the gap between 13 and 27 I would create 13 new rows with identical field values to thos in 13. And so on.

Any help would be much appreciated.

7 REPLIES 7
CTorres
Quartz | Level 8

Try this:

data have;

  input field1 field2 field3 sequence;

  cards;

1 2 3 4

4 5 6 13

7 8 9 27

;

run;

data want(drop=seq1 seq2 i);

  if 0 then set have nobs=n;

  retain rec seq1 seq2;

  rec+1;

  if rec=n+1 then stop;

  set have point=rec;

  output;

  seq1=sequence;

  rec+1;

  if rec=n+1 then stop;

  set have point=rec;

  seq2=sequence;

  rec+(-1);

  set have point=rec;

  do i=seq1+1 to seq2-1;

    sequence=i;

    output;

  end;

run;

CTorres

brophymj
Quartz | Level 8

Many thanks Torres, Please see my response below. I had left out some detail that i thought would not be a problem but looking at your solution I think the code needs to be adjusted to allow for the additional detail.

slchen
Lapis Lazuli | Level 10

data want(drop=_sequence);

merge have have(firstobs=2 keep=sequence rename=(sequence=_sequence)) end=last;

if last then output;

else do sequence=sequence to _sequence-1;

   output;

end;

run;

brophymj
Quartz | Level 8

Hi Slchen

Thanks for your help - please see my query below. It provides a bit more detail I originally had left out. It complicates things a bit more so i'm not sure how to adapt your approach to this problem.

Astounding
PROC Star

Here's one way:

data want;

   set have end=done;

   if not done then do;

      set have (firstobs=2 keep=sequence rename=(sequence=next_sequence));

      do sequence = sequence to next_sequence-1;

         output;

      end;

   end;

   else output;

   drop next_sequence;

run;

Good luck.

brophymj
Quartz | Level 8

Many thank for that I just have one further query. The above works well for the sample dataset I gave but I'm not sure how it would apply to the actual dataset I'm using as it is a little more complex.

Basically, I have a very large dataset containing insurance policy transaction information for commercial property risks (so all the historic renewals etc). Each policy has a unique policy number and within each policy there are multiple locations. Within each of these locations there is a historic set of transactions each with a unique sequence number. When you go into one of these sequences you get a list of all the poperties in a scroll screen.The property details are stored in a seperate file but I the common link between the two is policy_no||policy_location||policy_seq. Now the main data set will have all the sequences with no gaps so the sequence numbers will range from 1 to, say 150, and the latest one is always 99999. So the sequence is 1,2,3,...,149,150,99999. The scroll screen with the property details is only update each time a building is added or deleted so the sequence has gaps. So, using the example above, the sequence in the scroll screen might be 1,5,13,24,144. The scroll screen with sequence 1 will be attached to the main dataset with sequence 1-4; the scroll screen with sequence 5 will be attached to the main dataset with sequence 5 to 12, and so on... The scroll screen with sequence 144 will be attached to the main dataset with sequence 144 to 99999.

The main dataset contains 100s of policies with 1000s of locations and 10000s of sequence numbers. So my question is if the above code can be adjusted to allow for this i.e. repeat the process above for every location for every policy?

Many thanks

Astounding
PROC Star

Well, the code as is might be adaptable but I'm wary of doing that.  You already described a data set with billions of observations and now you are asking about how to expand that.  Perhaps you would be better off describing your ultimate goal and seeing if there might be another approach to achieving it.

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 1306 views
  • 1 like
  • 4 in conversation