BookmarkSubscribeRSS Feed
Cugel
Obsidian | Level 7

Hi I'm looking for a way to add a sequence number to a data series. When joining two oracle tables I can use the OVER and PARTITION BY function. But now I'm looking for an equivalent function that I can use when it comes to SAS tables. My employer wants me to find the solution in a data step expression.

 

At the moment we are still working with an Oracle database, but in 2022 the database will become a Microsoft SQL server database. So in the future I can't use “real” Oracle SQL statements

 

Thanks in advance for any kind of help. With kind regards.

4 REPLIES 4
Tom
Super User Tom
Super User

Please post text as text, not photographs and not attachments.  You can use the Insert Code button for text and the Insert SAS code button for code.  If you do post photographs then post them as photographs and not attachments, you can use the Insert Photos button.

 

So it looks like you have these two datasets:

Spoiler
data verr ;
  infile cards dsd dlm=';' truncover;
  input CLIENT_ID DBC_ID D_EXAMINATION :ddmmyy. CODE_EXAMINATION ;
  format D_EXAMINATION yymmdd10.;
cards4;
37900;1269307;19-07-2020;190013
37900;1269307;03-09-2020;190013
44600;1256861;03-10-2019;190013
44600;1256861;23-10-2019;190013
44600;1256861;08-12-2019;190013
44600;1256861;15-03-2020;190013
44600;1256861;20-09-2020;190013
54100;1253656;29-10-2019;190013
54100;1253656;26-01-2020;190013
57325;1263533;24-02-2020;190162
57325;1263533;16-04-2020;190162
57325;1263533;07-05-2020;190013
57325;1263533;03-08-2020;190162
57325;1263533;26-08-2020;190162
60975;1272676;22-09-2020;190013
76325;1268583;06-07-2020;190013
141400;1248870;21-07-2019;190013
141400;1248870;29-07-2019;190013
141400;1248870;03-09-2019;190013
183800;1245144;27-05-2019;190162
183800;1245144;01-07-2019;190013
233050;1254737;20-10-2019;190013
233050;1254737;17-11-2019;190013
233050;1254737;01-06-2020;190013
233050;1254737;15-07-2020;190162
235725;1241352;05-05-2019;190013
235725;1241352;09-06-2019;190013
235725;1241352;29-08-2019;190013
235725;1241352;15-10-2019;190013
235725;1241352;17-12-2019;190013
235725;1241352;22-01-2020;190013
235725;1260301;24-02-2020;190013
235725;1260301;16-04-2020;190013
235725;1260301;15-06-2020;190013
235725;1260301;19-07-2020;190162
235725;1260301;08-10-2020;190013
247725;1274542;27-09-2020;190013
282675;1269742;18-06-2020;190013
289875;1260064;08-01-2020;190013
289875;1260064;05-02-2020;190013
337850;1260633;08-03-2020;190013
416125;1268400;22-07-2020;190162
416125;1268400;08-09-2020;190162
459450;1253398;01-10-2019;190013
459450;1253398;29-10-2019;190013
459450;1253398;31-12-2019;190013
459450;1253398;11-02-2020;190013
475550;1271725;26-07-2020;190013
475550;1271725;30-08-2020;190013
475550;1271725;12-10-2020;190162
475550;1271725;20-10-2020;190162
478200;1257771;09-12-2019;190013
478200;1257771;15-01-2020;190013
478200;1257771;29-01-2020;190013
478200;1257771;11-03-2020;190013
478850;1269364;22-07-2020;190013
478850;1269364;08-09-2020;190162
486150;1235265;26-02-2019;190013
486150;1235265;03-04-2019;190162
486150;1235265;09-04-2019;190013
486150;1235265;21-05-2019;190162
486150;1235265;03-07-2019;190013
486150;1235265;25-09-2019;190013
486150;1235265;28-10-2019;190162
486150;1235265;01-11-2020;190162
;;;;

data dbc;
  infile cards dsd dlm=';' truncover;
  input CLIENT_ID DBC_ID D_DBC :ddmmyy. ;
  format D_DBC yymmdd10.;
cards4;
37900;1269307;17-06-2021
37900;1270212;30-06-2021
44600;1256861;03-09-2020
54100;1253656;01-10-2020
57325;1263533;02-03-2021
60975;1272676;12-08-2021
76325;1268583;07-06-2021
76325;1271385;21-07-2021
141400;1248870;10-07-2020
183800;1245144;01-05-2020
233050;1254737;19-10-2020
235725;1241352;05-04-2020
235725;1260301;16-01-2021
247725;1274542;15-09-2021
282675;1269742;24-06-2021
289875;1260064;09-01-2021
337850;1260633;23-01-2021
416125;1268400;04-06-2021
459450;1253398;28-09-2020
475550;1271725;26-07-2021
478200;1257771;03-12-2020
478850;1269364;19-06-2021
486150;1235265;03-02-2020
;;;;

You haven't said how to combine the two datasets.  Your names look distinct so let's just let SAS guess. You haven't said what the key fields are for your sequencing, let's guess you want to use  client_id, dbc_id, d_dbc, d_examination.

proc sql ;
  create table both as select * from verr natural join dbc 
   order by client_id, dbc_id, d_dbc, d_examination
;
quit;

data want;
  set both;
  by client_id dbc_id d_dbc d_examination;
  seqno+1;
  if first.d_dbc then seqno=1;
run;

Results:

       CLIENT_                                                 CODE_
Obs       ID       DBC_ID         D_DBC    D_EXAMINATION    EXAMINATION    seqno

  1      37900    1269307    2021-06-17     2020-07-19         190013        1
  2      37900    1269307    2021-06-17     2020-09-03         190013        2
  3      44600    1256861    2020-09-03     2019-10-03         190013        1
  4      44600    1256861    2020-09-03     2019-10-23         190013        2
  5      44600    1256861    2020-09-03     2019-12-08         190013        3
  6      44600    1256861    2020-09-03     2020-03-15         190013        4
  7      44600    1256861    2020-09-03     2020-09-20         190013        5
  8      54100    1253656    2020-10-01     2019-10-29         190013        1
  9      54100    1253656    2020-10-01     2020-01-26         190013        2
 10      57325    1263533    2021-03-02     2020-02-24         190162        1
 11      57325    1263533    2021-03-02     2020-04-16         190162        2
 12      57325    1263533    2021-03-02     2020-05-07         190013        3
 13      57325    1263533    2021-03-02     2020-08-03         190162        4
 14      57325    1263533    2021-03-02     2020-08-26         190162        5
 15      60975    1272676    2021-08-12     2020-09-22         190013        1
 16      76325    1268583    2021-06-07     2020-07-06         190013        1
 17     141400    1248870    2020-07-10     2019-07-21         190013        1
 18     141400    1248870    2020-07-10     2019-07-29         190013        2
 19     141400    1248870    2020-07-10     2019-09-03         190013        3
...

PS: If you display dates in MDY or DMY order which ever one you pick you will confuse half of your audience.

Cugel
Obsidian | Level 7
Hi Tom, Thanks for your response.



You made the right assumption that the tables should be linked based on the client_id and the dbc_id.



Your solution works but uses proc_sql while I'm looking for a solution to include as expression code.



Your sequencing assumption is also correct and should be determined based on the client_id, dbc_id and d_examination.



[cid:image001.jpg@01D7DCA1.A7380230]




Tom
Super User Tom
Super User

"expression code" in what?

There is not an "expression code" that will perform this type of dataset level operation.

 

Are you using some type of point and click tool that generates code instead of creating an actual program?
Which tool are you using?

 

If the tool can generate a data step (with a BY statement) then you might trick it by using the IFN() function.

seqno=ifn(first.d_dbc,1,lag(seqno)+1);

But not sure if the confusing nature of that code is worth it. 

 

Just add actual code step that creates the sequence number.  If performance is an issue use a VIEW for the input this new step.

Cugel
Obsidian | Level 7

Hi Tom, Our organization uses SAS DI studio.Screen Shot 11-18-21 at 05.41 PM.JPGScreen Shot 11-18-21 at 05.42 PM.JPG

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 2600 views
  • 0 likes
  • 2 in conversation