BookmarkSubscribeRSS Feed
SteinOk
Calcite | Level 5

I have data where i have a broken sequence of numbers (seq_have) where i want to continue the sequence from largest number in the existing sequence for each ID, for the observations with missing sequence numbers. The order of the observations without sequence numbers is arbitrary, as long as it doesn't overlap and is larger then the largest number of the existing sequence inside the group.

 

The solution i have come up with doesn't seem very elegant, and I'm worried it will be prone to errors once implemented at a larger scale.

 

Any thoughts on the likelihood of errors that could occur, or how to improve the solution?

The example: 

data example;
input ID $ seq_have data;
cards;
1 1 111
1 2 222
1 . 333
2 . 444
2 2 222
2 3 333
3 . 123
3 . 123
3 . 123
4 4 444
4 . 555
;
run;

Attempted solution: 

proc sql;
 create table int1 as select *, max(seq_have) as seq_max from example group by id;
quit;

proc sort data=int1;
 by id seq_have;
run;

data int2;
 set int1;
 by id ;
 seq_new +1;
 if first.id then seq_new = 1;
run;

data result;
set int2;
 if seq_have eq . then 
  seq_res = sum(seq_new,seq_max);
 if seq_have not eq . then seq_res = seq_have;
run;

proc sort data=result;
 by id seq_res;
run;

 

 

 

 

 

 

 

 

 

 

3 REPLIES 3
RichardDeVen
Barite | Level 11

Use a DOW loop to compute the max seq_have in each group, then in subsequent loop use max_seq_have as basis for computing needed seq_have.

 

Example:

data want;
  do until (last.id);
    set have;
    by id;
    max_seq_have = max(max_seq_have, seq_have);
  end;

  do until (last.id);
    set have;
    if missing(seq_have) then do;
      _k = sum(_k, 1); 
      seq_have = sum (max_seq_have, _k);
    end;
    output;

  drop _k max_seq_have;
  end;
DavePrinsloo
Pyrite | Level 9

If you have access to a database like Oracle, then I recommend the use of database sequences to ensure you keep sequences clean.
This is is especially useful is the numbers being sequenced are id columns and the tables may get records added by multiple program which may potentially run in parallel.   The database guarantees that no numbers are duplicated - which can happen if concurrently executing SAS programs all get the max value from a table and deduce the next number.


I recently created a sas macro that uses Oracle passthru to initialise such Oracle sequences from the current maximum value (+ 1)
This use Oracle passthrough.
Once the sequences are established another macro gets the next sequence val.
The heart of the macro has code like:

proc sql noprint;  
   connect to Oracle (&connection_info);
 
 /* not shown here: code to check if the  sequence exists and create 
    it if needed */

   select nextval into :_netxval from
      connection to Oracle (select &sequence..nextval from dual);
  disconnect from Oracle;

 

The performance of identifying the maximum value depends on the number of records in the table.   Oracle sequences (and I have done the same in DB2) keep an underlying table with the sequence name and next value.  Queries to it locks the row, provides the next value, and then  unlocks the row, thereby guaranteeing uniqueness.   Oracle always does this in a fraction of a second.

smantha
Lapis Lazuli | Level 10

Assuming the input lay out is what you have and the data is ordered by id things are little simpler. You have the following scenarios

1. What if the by group starts with a missing value and the first non miss value in the by group is 1?

2. The first value is missing but all the other values are okay

 

Assumption: Data is already ordered by id and everything is in place

 

   

data example;
input ID $ seq_have data;
cards;
1 1 111
1 2 222
1 . 333
2 . 444
2 2 222
2 3 333
3 . 123
3 . 123
3 . 123
4 4 444
4 . 555
;
run;

data example2;
set example;
by id;
retain have_seq;
if first.id then do;
	if missing(seq_have) then do;
		have_seq=1;
	end;
	else do;
          have_seq=seq_have;
    end;
end;
else do;
if ^missing(seq_have) then have_seq=seq_have;
	if missing(seq_have) then do;
		have_seq=have_seq+1;
		*have_seq = seq_have;
	end;
end;
run;

 

   

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 3752 views
  • 2 likes
  • 4 in conversation