I'm trying to "translate" this data step to SQL
%let N = 1000;
data want;
do i=1 to &N;
output; end;
run;
which simply creates such a table
I tried many codes using functions from other platform/languages but none of them work, for example
/*SNOWFLAKE*/
proc sql;
select seq4() as number
from table(generator(rowcount => &N))
order by 1;
quit;
/*SQL SERVER*/
proc sql;
SELECT * FROM generate_series(0,&N);
quit;
proc sql;
SELECT * FROM UNNEST(GENERATE_ARRAY(0, &N));
quit;
Any ideas? Is there a similar function in SAS?
I've just noticed a minor inefficiency in the code that I copied from my old post: If N happens to be a power of 2 (2, 4, 8, 16, ...), view BASE creates 2N observations, while N would be sufficient. This can be avoided by replacing int(log2(&N)-1) with ceil(log2(&N)-2):
%let N=100; proc sql noprint; create table seed (x num); insert into seed set x=0 set x=1; select 'seed'||repeat(',seed',ceil(log2(&N)-2)) into :ds from seed; create view base as select ' ' from &ds; create table want(where=(i<=&N)) as select monotonic() as i from base; quit;
Look up the monotonic() function in proc sql.
I meant to include the example code below in my last post.
proc sql;
create table myair as
select *
, monotonic() as rowidx
from sashelp.air;
quit;
I see that you use an input table and count the rows. But what if N is 10000, 10000000 or bigger? Isn't possible to create such a table without using an input table?
In general SQL does not support looping structures like DO i= . It also generally does not process records in anything resembling a specific order. So why is this supposed to be done in SQL?
If this just a learning exercise the take away is that this sort of activity is not an SQL strong point.
Hello @Rabelais,
I think in PROC SQL you might use the undocumented MONOTONIC function -- as ChanceTGardener has already suggested -- and so did I (reluctantly) when someone asked a similar question in 2019 (see Re: DO Loop with OUTPUT in SQL?). So, if you have a dataset at hand with at least N observations (e.g., the 428 obs. in SASHELP.CARS for N=100), you can use something like this:
%let N=100;
proc sql;
create table want(where=(i<=&N)) as
select monotonic() as i from sashelp.cars;
quit;
Otherwise it gets more complicated as you need to create such a dataset (or view) from scratch, e.g., like view BASE below:
proc sql noprint;
create table seed (x num);
insert into seed
set x=0
set x=1;
select 'seed'||repeat(',seed',int(log2(&N)-1)) into :ds
from seed;
create view base as
select ' ' from &ds;
create table want(where=(i<=&N)) as
select monotonic() as i from base;
quit;
Again, this is really not recommended. Use a DATA step instead.
I've just noticed a minor inefficiency in the code that I copied from my old post: If N happens to be a power of 2 (2, 4, 8, 16, ...), view BASE creates 2N observations, while N would be sufficient. This can be avoided by replacing int(log2(&N)-1) with ceil(log2(&N)-2):
%let N=100; proc sql noprint; create table seed (x num); insert into seed set x=0 set x=1; select 'seed'||repeat(',seed',ceil(log2(&N)-2)) into :ds from seed; create view base as select ' ' from &ds; create table want(where=(i<=&N)) as select monotonic() as i from base; quit;
There is the undocumented (and NOT supported) monotonic() functions mentioned by others. My advice is to never use undocumented functions in a production environment.
You can't do this with SAS ANSI SQL and you don't have to given it's so simple using a SAS datastep.
There are SQL extensions in many databases that allow for creation of such row numbering and if you need this in a database table then you can use explicit passthrough SQL.
This sounds like an XY Problem.
Why do you want make such a dataset?
Why don't you just use a data step since you already showed you know how to that?
Why would you even want to attempt to do it in PROC SQL?
Curiosity is the lust of the mind!
As a new SAS user, I'm asking myself lot of questions.
Q2 & Q3: (1) Lot of tasks can be done by using different approaches and since I'm working with large datasets (this is not the case since generating even a big sequence is fast) I usually look for the most efficient one. (2) Since I'm new to SAS I often wonder if my code can be improved. (3) Also, sometimes I'm simply curious about different ways of doing the same task.
Q1: see the toy example below (it's just 10 rows, but I'm working with millions).
Basically I need to "expand" the dataset HAVE so that "i" goes from 1 to N.
I was wondering if the second data step and the proc sql could be easily combined into a single proc sql or data step.
I tried using PROC EXPAND to interpolate the missing "i" values, but what it does is rather to replace a missing value with an interpolated one on an already present row, it doesn't add new rows.
data have;
input i c; cards;
7 0.2
3 0.5
6 0.3
run;
data sequence;
do i=1 to 10;
output;
end;
run;
proc sql;
create table want as select
a.i, b.c
from sequence a
left join have b
on a.i = b.i;
quit;
@Rabelais wrote:
Q1: see the toy example below (it's just 10 rows, but I'm working with millions).
Basically I need to "expand" the dataset HAVE so that "i" goes from 1 to N.
I strongly suggest that you show an example of what that expansion looks like. I can think of at least two possible different "expansions" but without an idea of the result should look like it isn't worth bothering to write any example code.
I added pictures with code's output, i.e. HAVE and WANT datasets
proc sort data=have; by i; run; data want; merge sequence have; by i; run;
Assuming sequence is built as described.
The explicit sort and merge may run faster than SQL as SQL does that in the background for the Join on.
BUT you talk about "millions" of observations. Are any of the I values in HAVE ever duplicated? If so you need to include such in your example and perhaps describe rules involved.
I would think PROC EXPAND could fill out a series for you with a constant value. Perhaps you just did not find the right option for how to specify the constant value you wanted it to use?
But don't bother trying to make SQL do something like iterating. That is not part of the grammar that SQL uses to describe the actions it can take.
If you want to make sure your data always has observations for ID=1 to 10 then do something like this:
data skeleton;
do id=1 to 10;
output;
end;
run;
data want;
merge skeleton actual_data;
by id;
run;
No SQL needed (or wanted).
SQL is designed to manage data and manipulate it, typically not to create new rows/records. Even a pivot isn't a simple operation in SQL.
SAS on the other hand is more data wrangling/analysis friendly.
Right tool for the right job.
Several different sql options are listed here you can play around with different DBs, and different dbs will have different methods.
But not SQL also has the ability to create unique key identifiers on each row automatically, which SAS does not.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.