Solved: Re: Create a table with increasing integers from 1 to N in SQL

Rabelais · Posted 02-29-2024 12:06 PM

I'm trying to "translate" this data step to SQL

%let N = 1000;
data want;
	do i=1 to &N;
		output; end;
run;

which simply creates such a table

I tried many codes using functions from other platform/languages but none of them work, for example

/*SNOWFLAKE*/
proc sql;
	select seq4() as number
	from table(generator(rowcount => &N))
	order by 1;
quit;

/*SQL SERVER*/
proc sql;
	SELECT * FROM generate_series(0,&N);
quit;

proc sql;
	SELECT * FROM UNNEST(GENERATE_ARRAY(0, &N));
quit;

Any ideas? Is there a similar function in SAS?

FreelanceReinh · Posted 03-02-2024 04:49 AM

I've just noticed a minor inefficiency in the code that I copied from my old post: If N happens to be a power of 2 (2, 4, 8, 16, ...), view BASE creates 2N observations, while N would be sufficient. This can be avoided by replacing int(log2(&N)-1) with ceil(log2(&N)-2):

%let N=100;

proc sql noprint;
create table seed (x num);
insert into seed
set x=0
set x=1;

select 'seed'||repeat(',seed',ceil(log2(&N)-2)) into :ds
from seed;

create view base as
select ' ' from &ds;

create table want(where=(i<=&N)) as
select monotonic() as i from base;
quit;

View solution in original post

ChanceTGardener · Posted 02-29-2024 12:28 PM

Look up the monotonic() function in proc sql.

ChanceTGardener · Posted 02-29-2024 12:31 PM

I meant to include the example code below in my last post.

proc sql;
 create table myair as 
  select *
  ,	     monotonic() as rowidx
  from sashelp.air;
quit;

Rabelais · Posted 02-29-2024 12:39 PM

I see that you use an input table and count the rows. But what if N is 10000, 10000000 or bigger? Isn't possible to create such a table without using an input table?

ballardw · Posted 02-29-2024 12:36 PM

In general SQL does not support looping structures like DO i= . It also generally does not process records in anything resembling a specific order. So why is this supposed to be done in SQL?

If this just a learning exercise the take away is that this sort of activity is not an SQL strong point.

FreelanceReinh · Posted 02-29-2024 12:47 PM

Hello @Rabelais,

I think in PROC SQL you might use the undocumented MONOTONIC function -- as ChanceTGardener has already suggested -- and so did I (reluctantly) when someone asked a similar question in 2019 (see Re: DO Loop with OUTPUT in SQL?). So, if you have a dataset at hand with at least N observations (e.g., the 428 obs. in SASHELP.CARS for N=100), you can use something like this:

%let N=100;

proc sql;
create table want(where=(i<=&N)) as
select monotonic() as i from sashelp.cars;
quit;

Otherwise it gets more complicated as you need to create such a dataset (or view) from scratch, e.g., like view BASE below:

proc sql noprint;
create table seed (x num);
insert into seed
set x=0
set x=1;

select 'seed'||repeat(',seed',int(log2(&N)-1)) into :ds
from seed;

create view base as
select ' ' from &ds;

create table want(where=(i<=&N)) as
select monotonic() as i from base;
quit;

Again, this is really not recommended. Use a DATA step instead.

FreelanceReinh · Posted 03-02-2024 04:49 AM

I've just noticed a minor inefficiency in the code that I copied from my old post: If N happens to be a power of 2 (2, 4, 8, 16, ...), view BASE creates 2N observations, while N would be sufficient. This can be avoided by replacing int(log2(&N)-1) with ceil(log2(&N)-2):

%let N=100;

proc sql noprint;
create table seed (x num);
insert into seed
set x=0
set x=1;

select 'seed'||repeat(',seed',ceil(log2(&N)-2)) into :ds
from seed;

create view base as
select ' ' from &ds;

create table want(where=(i<=&N)) as
select monotonic() as i from base;
quit;

Patrick · Posted 02-29-2024 07:20 PM

There is the undocumented (and NOT supported) monotonic() functions mentioned by others. My advice is to never use undocumented functions in a production environment.

You can't do this with SAS ANSI SQL and you don't have to given it's so simple using a SAS datastep.

There are SQL extensions in many databases that allow for creation of such row numbering and if you need this in a database table then you can use explicit passthrough SQL.

Tom · Posted 02-29-2024 08:11 PM

This sounds like an XY Problem.

Why do you want make such a dataset?
Why don't you just use a data step since you already showed you know how to that?

Why would you even want to attempt to do it in PROC SQL?

Rabelais · Posted 03-01-2024 09:04 AM

@Tom

Curiosity is the lust of the mind!

As a new SAS user, I'm asking myself lot of questions.

Q2 & Q3: (1) Lot of tasks can be done by using different approaches and since I'm working with large datasets (this is not the case since generating even a big sequence is fast) I usually look for the most efficient one. (2) Since I'm new to SAS I often wonder if my code can be improved. (3) Also, sometimes I'm simply curious about different ways of doing the same task.

Q1: see the toy example below (it's just 10 rows, but I'm working with millions).

Basically I need to "expand" the dataset HAVE so that "i" goes from 1 to N.

I was wondering if the second data step and the proc sql could be easily combined into a single proc sql or data step.

I tried using PROC EXPAND to interpolate the missing "i" values, but what it does is rather to replace a missing value with an interpolated one on an already present row, it doesn't add new rows.

data have;
input i c; cards;
7 0.2
3 0.5
6 0.3
run;

data sequence;
	do i=1 to 10;
		output;
	end;
run;

proc sql;
	create table want as select
		a.i, b.c
	from sequence a
	left join have b
	on a.i = b.i;
quit;

ballardw · Posted 03-01-2024 11:43 AM

@Rabelais wrote:

@Tom

Q1: see the toy example below (it's just 10 rows, but I'm working with millions).

Basically I need to "expand" the dataset HAVE so that "i" goes from 1 to N.

I strongly suggest that you show an example of what that expansion looks like. I can think of at least two possible different "expansions" but without an idea of the result should look like it isn't worth bothering to write any example code.

Rabelais · Posted 03-01-2024 11:53 AM

@ballardw

I added pictures with code's output, i.e. HAVE and WANT datasets

ballardw · Posted 03-01-2024 12:16 PM

proc sort data=have;
   by i;
run;

data want;
  merge sequence
        have;
  by i;
run;

Assuming sequence is built as described.

The explicit sort and merge may run faster than SQL as SQL does that in the background for the Join on.

BUT you talk about "millions" of observations. Are any of the I values in HAVE ever duplicated? If so you need to include such in your example and perhaps describe rules involved.

Tom · Posted 03-01-2024 11:59 AM

I would think PROC EXPAND could fill out a series for you with a constant value. Perhaps you just did not find the right option for how to specify the constant value you wanted it to use?

But don't bother trying to make SQL do something like iterating. That is not part of the grammar that SQL uses to describe the actions it can take.

If you want to make sure your data always has observations for ID=1 to 10 then do something like this:

data skeleton;
  do id=1 to 10;
    output;
  end;
run;
data want;
  merge skeleton actual_data;
  by id;
run;

No SQL needed (or wanted).

Reeza · Posted 03-01-2024 12:28 PM

SQL is designed to manage data and manipulate it, typically not to create new rows/records. Even a pivot isn't a simple operation in SQL.

SAS on the other hand is more data wrangling/analysis friendly.

Right tool for the right job.

Several different sql options are listed here you can play around with different DBs, and different dbs will have different methods.

But not SQL also has the ability to create unique key identifiers on each row automatically, which SAS does not.

SAS Innovate 2025: Save the Date

SAS Training: Just a Click Away