BookmarkSubscribeRSS Feed
Rabelais
Fluorite | Level 6

I'm trying to "translate" this data step to SQL

 

%let N = 1000;
data want; do i=1 to &N; output; end; run;

 

which simply creates such a table

Rabelais_0-1709226229549.png

I tried many codes using functions from other platform/languages but none of them work, for example

 

/*SNOWFLAKE*/
proc sql;
	select seq4() as number
	from table(generator(rowcount => &N))
	order by 1;
quit;

/*SQL SERVER*/
proc sql;
	SELECT * FROM generate_series(0,&N);
quit;

proc sql;
	SELECT * FROM UNNEST(GENERATE_ARRAY(0, &N));
quit;

 

Any ideas? Is there a similar function in SAS?

16 REPLIES 16
ChanceTGardener
SAS Employee

Look up the monotonic() function in proc sql. 

ChanceTGardener
SAS Employee

I meant to include the example code below in my last post.

 

proc sql;
 create table myair as 
  select *
  ,	     monotonic() as rowidx
  from sashelp.air;
quit;
Rabelais
Fluorite | Level 6

I see that you use an input table and count the rows. But what if N is 10000, 10000000 or bigger? Isn't possible to create such a table without using an input table?

ballardw
Super User

In general SQL does not support looping structures like DO i= . It also generally does not process records in anything resembling a specific order. So why is this supposed to be done in SQL?

If this just a learning exercise the take away is that this sort of activity is not an SQL strong point.

FreelanceReinh
Jade | Level 19

Hello @Rabelais,

 

I think in PROC SQL you might use the undocumented MONOTONIC function -- as ChanceTGardener has already suggested -- and so did I (reluctantly) when someone asked a similar question in 2019 (see Re: DO Loop with OUTPUT in SQL?). So, if you have a dataset at hand with at least N observations (e.g., the 428 obs. in SASHELP.CARS for N=100), you can use something like this:

%let N=100;

proc sql;
create table want(where=(i<=&N)) as
select monotonic() as i from sashelp.cars;
quit;

Otherwise it gets more complicated as you need to create such a dataset (or view) from scratch, e.g., like view BASE below:

proc sql noprint;
create table seed (x num);
insert into seed
set x=0
set x=1;

select 'seed'||repeat(',seed',int(log2(&N)-1)) into :ds
from seed;

create view base as
select ' ' from &ds;

create table want(where=(i<=&N)) as
select monotonic() as i from base;
quit;

Again, this is really not recommended. Use a DATA step instead.

FreelanceReinh
Jade | Level 19

I've just noticed a minor inefficiency in the code that I copied from my old post: If N happens to be a power of 2 (2, 4, 8, 16, ...), view BASE creates 2N observations, while N would be sufficient. This can be avoided by replacing int(log2(&N)-1) with ceil(log2(&N)-2):

%let N=100;

proc sql noprint;
create table seed (x num);
insert into seed
set x=0
set x=1;

select 'seed'||repeat(',seed',ceil(log2(&N)-2)) into :ds
from seed;

create view base as
select ' ' from &ds;

create table want(where=(i<=&N)) as
select monotonic() as i from base;
quit;
Patrick
Opal | Level 21

There is the undocumented (and NOT supported) monotonic() functions mentioned by others. My advice is to never use undocumented functions in a production environment.

You can't do this with SAS ANSI SQL and you don't have to given it's so simple using a SAS datastep.

There are SQL extensions in many databases that allow for creation of such row numbering and if you need this in a database table then you can use explicit passthrough SQL.

Tom
Super User Tom
Super User

This sounds like an XY Problem.

 

Why do you want make such a dataset?
Why don't you just use a data step since you already showed you know how to that?

Why would you even want to attempt to do it in PROC SQL?

Rabelais
Fluorite | Level 6

@Tom

Curiosity is the lust of the mind!

As a new SAS user, I'm asking myself lot of questions.

 

Q2 & Q3: (1) Lot of tasks can be done by using different approaches and since I'm working with large datasets (this is not the case since generating even a big sequence is fast) I usually look for the most efficient one. (2) Since I'm new to SAS I often wonder if my code can be improved. (3) Also, sometimes I'm simply curious about different ways of doing the same task.

 

Q1: see the toy example below (it's just 10 rows, but I'm working with millions).

Basically I need to "expand" the dataset HAVE so that "i" goes from 1 to N.

I was wondering if the second data step and the proc sql could be easily combined into a single proc sql or data step.

I tried using PROC EXPAND to interpolate the missing "i" values, but what it does is rather to replace a missing value with an interpolated one on an already present row, it doesn't add new rows.

 

data have;
input i c; cards;
7 0.2
3 0.5
6 0.3
run; data sequence; do i=1 to 10; output; end; run; proc sql; create table want as select a.i, b.c from sequence a left join have b on a.i = b.i; quit;

Rabelais_0-1709311865311.png

Rabelais_1-1709311879926.png

 
 

 

 

ballardw
Super User

@Rabelais wrote:

@Tom

 

Q1: see the toy example below (it's just 10 rows, but I'm working with millions).

Basically I need to "expand" the dataset HAVE so that "i" goes from 1 to N.

I strongly suggest that you show an example of what that expansion looks like. I can think of at least two possible different "expansions" but without an idea of the result should look like it isn't worth bothering to write any example code.

Rabelais
Fluorite | Level 6

@ballardw

I added pictures with code's output, i.e. HAVE and WANT datasets

ballardw
Super User
proc sort data=have;
   by i;
run;

data want;
  merge sequence
        have;
  by i;
run;

Assuming sequence is built as described.

The explicit sort and merge may run faster than SQL as SQL does that in the background for the Join on.

 

BUT you talk about "millions" of observations. Are any of the I values in HAVE ever duplicated? If so you need to include such in your example and perhaps describe rules involved.

Tom
Super User Tom
Super User

I would think PROC EXPAND could fill out a series for you with a constant value.  Perhaps you just did not find the right option for how to specify the constant value you wanted it to use?

 

But don't bother trying to make SQL do something like iterating.  That is not part of the grammar that SQL uses to describe the actions it can take.

 

If you want to make sure your data always has observations for ID=1 to 10 then do something like this:

 

data skeleton;
  do id=1 to 10;
    output;
  end;
run;
data want;
  merge skeleton actual_data;
  by id;
run;

No SQL needed (or wanted).

 

Reeza
Super User

SQL is designed to manage data and manipulate it, typically not to create new rows/records. Even a pivot isn't a simple operation in SQL. 

 

SAS on the other hand is more data wrangling/analysis friendly. 

 

Right tool for the right job. 

 

Several different sql options are listed here you can play around with different DBs, and different dbs will have different methods. 

But not SQL also has the ability to create unique key identifiers on each row automatically, which SAS does not. 

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 16 replies
  • 1155 views
  • 10 likes
  • 7 in conversation