BookmarkSubscribeRSS Feed

Hi,

 

The idea is to allow to use hyphen(-) notation in the DATA statement. So the following code would work:

data b1-b10 ;
 set sashelp.class ;              
run ; 

and create 10 data sets b1 to b10.

The idea is to make it symmetric to SET statement behavior such as:

data a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 ;
  do value1 = 1 to 10;
    output;
  end; 
run ;       

data All_a1_to_a10 ;
 set a1-a10;              
run ; 

All the best

Bart

 

13 Comments
PGStats
Opal | Level 21

Dataset lists in the SET statement usually refer to a series of datasets that need to be concatenated.

How would creating many copies of the same dataset be useful? Such a feature would not promote good programming practice, and would be contrary to normalization principles since it would create many versions of the same information.

 

Why not ask for the same feature for the OUTPUT statement?

 

if condition then output a1-a5;
else output a6-a10;
yabwon
Onyx | Level 15

Hi @PGStats,

 

Thanks for the comment. 

 

It is very interesting point about OUTPUT statement, I agree that it would be great to have such complementary feature too.

 

About how it (hyphen) would be useful in DATA statement? A list of "shell data set". Let me put it in code:

data summary201901-summary201912;
  length
    a 8
    b 8
    c $ 16
    d $ 64
    e 8
    ;
    stop;
    retain a b e . 
           c d   " ";
run;

All the best

Bart

ballardw
Super User

Could you please describe an actual use case? I am not seeing any immediate need making this a somewhat radical departure from basic SAS code.

 

(not to mention I can implement this with a 5 line macro, at least on the data statement, output is a touch trickier.

yabwon
Onyx | Level 15

Hi @ballardw,

 

I know it can be done with macro code, it even could be implemented without any macro code:

data _null_;
  array A[10] (1 : 10);
  call execute('data b'
       !! catx(' b', of A[*])
       !! '; set sashelp.class; run;');
run;

but the ides is to be able to do it without "additional fancy/tricky effort". A favour to less experienced users plus syntactic sugar.

 

All the best

Bart  

ballardw
Super User

I really question the need though. Creating duplicates of a data set are likely better done with another procedure and any code that directs different records to different output data sets is going to entail enough other work with the explicit output statements that the extremely limit saving of effort with this option doesn't seem to provide much actual programming time.

Quentin
Super User

I like this idea, just for the general idea of consistency.  That is, SAS has the concept of variable lists (https://documentation.sas.com/?docsetId=lrcon&docsetTarget=p0wphcpsfgx6o7n1sjtqzizp1n39.htm&docsetVe...) , and you can use different types of variable lists anywhere that a list of variables is valid code.

 

So seems odd that the concept of a data set list is defined only for the SET statement.

https://documentation.sas.com/?docsetId=lestmtsref&docsetTarget=p00hxg3x8lwivcn1f0e9axziw57y.htm&doc...

 

Since I can do below (I hadn't realized the OUTPUT statement would work with an explicit list...)

data b1 b2 b3 ;  
  set a1 a2 a3 ;
  output b1 b2 b3 ;
run ;

Seems reasonable that I would be able to do:

data b1-b3 ;  
  set a1-a3 ;
  output b1-b3 ;
run ;

and maybe even:

 data b1-b3 foo ;  
  set a1-a3 ;
  if x=1 then output b: ;
  else output foo;
 run ;

 

The code already accepts explicit lists of data set names.  Why not allow a numbered range list or name prefix list?

yabwon
Onyx | Level 15

Hi @ballardw,

 

Thanks for your opinion! As they say: "I agree to disagree" on this subject 🙂 Let me make use of @Kurt_Bremser 's maxim 12: "make it look nice" [I know I'm "overusing" it a bit;-)]

 

Hi @Quentin,

 

Thanks for supporting points! 

 

All the best

Bart

PhilC
Rhodochrosite | Level 12

An "array of dataset references"?,  is that what we want?  That's a separate request right?  I'm not so for this, I've never needed this , but, I think, an example demonstrating this feature(s) would be the following.  

 

data a1-a5;
/*data a1 a2 a3 a4 a5;*/
  set sashelp.class; 
  r=int(ranuni(0)*5)+1; drop r;
  select (r) ;
    when (1)  output a1;
    when (2)  output a2;
    when (3)  output a3;
    when (4)  output a4;
    otherwise output a5;
  end;
run;

data a1-a5;
  set sashelp.class; 
  r=int(ranuni(0)*5)+1; drop r;
  array a [5]  a1-a5 /datasets;  /*hypothetical array of dataset references*/
  output a[r];
run;

 

I'm lost because I don't see why one needs the notation to create multiple copies of a dataset.  

Quentin
Super User

Yes, I would say array of dataset references would be a separate request.  I was just thinking syntax for specifying a list of dataset references.

 

Array of dataset references where you can do: 

 output a[r];

is a neat idea, but I'm not sure that would be feasible in current data step.  That is, currently the output statement needs to know what it is pointing to when the step compiles.  For the above to work, it would need to resolve a[r] while the step is executing.

 

That's the kind of dynamic data writing you can do with the hash object output method, which is pretty nifty.  But I'm not sure DATA step output statement could do it.

yabwon
Onyx | Level 15

@PhilC  and @Quentin.

 

as you wrote, "an array of datasets" would be totally separate idea. My idea is, as Quentin said, just a syntax for specifying a list of dataset references.

 

All the best

Bart

 

P.S. @PhilC your idea of "an array of datasets" is very "appealing" I would up-vote it if you decide to put it into ballot ideas.

 

P.S.2 Btw. the idea for this post was born during my discussion with @hashman about creating a list of datasets from within datastep boundaries