DATA Step, Macro, Functions and more

Create a data set by grabbing the first <insert number> records from a larger one

Accepted Solution Solved
Reply
Regular Contributor
Posts: 216
Accepted Solution

Create a data set by grabbing the first <insert number> records from a larger one

Hi Everyone

I have a data set with about 100,000 records in it. I would like to create a smaller data set from that one that is composed of the first 50,000 records. I know I can add a sequential counter column to it and then do it that way, but I was wondering if there is a way to do this by just selecting the first 50,000 records of the larger one on the fly?

 

Paul


Accepted Solutions
Solution
‎06-08-2016 03:28 PM
Respected Advisor
Posts: 4,936

Re: Create a data set by grabbing the first <insert number> records from a larger one

Use a view:

 

data short / view=short;
set have;
if _n_ > 50000 then stop;
run;

proc summary data=short;

...

run;

PG

View solution in original post


All Replies
Solution
‎06-08-2016 03:28 PM
Respected Advisor
Posts: 4,936

Re: Create a data set by grabbing the first <insert number> records from a larger one

Use a view:

 

data short / view=short;
set have;
if _n_ > 50000 then stop;
run;

proc summary data=short;

...

run;

PG
Regular Contributor
Posts: 216

Re: Create a data set by grabbing the first <insert number> records from a larger one

Thanks PG. That works!

 

Paul

Super User
Posts: 5,518

Re: Create a data set by grabbing the first <insert number> records from a larger one

Also note ...

 

You can subset directly:

 

proc summary data=have (obs=50000) .....

 

If the subset is for use in multiple steps, be wary of using a view.  Your program will likely run faster if you create a data set rather than a view.  Each time a view is used, it has to re-execute the instructions within it. 

Respected Advisor
Posts: 3,799

Re: Create a data set by grabbing the first <insert number> records from a larger one

Posted in reply to Astounding

@Astoundingfor the subset of first 50,000 obs using OBS= vs the view proposed by @PGStats or a new data set with 50,000 obs it will make little different if any for  each method will still read 50,000 records and stop.

 

Super User
Posts: 5,518

Re: Create a data set by grabbing the first <insert number> records from a larger one

Posted in reply to data_null__

Rather than try to debate the issue, I ran a test.  Granted, it's one machine and one operating system.  I tested 10,000 runs against a data set, and it took about 14 minutes.  Not wanting to wait that long, I tested only 5,000 runs against a view.  That took nearly 12 minutes (almost twice as long per run).  The test data set contained 100 variables and (originally) 100,000 observations.

 

Each "run" was a simple step:

 

data want;

set subset;

run;

 

or

 

data want;

set viewname;

run;

 

At this point, I'm still convinced that the view is slower for multiple uses. 

Respected Advisor
Posts: 4,936

Re: Create a data set by grabbing the first <insert number> records from a larger one

Posted in reply to Astounding

Yes @Astounding, there is certainly extra buffering involved in using a view. But OP's question was for an "on the fly" method, which I assume is something to be used just once.

 

OBS= dataset option is probably the most efficient method when nothing more than limiting the number observations read is required.

PG
Respected Advisor
Posts: 3,799

Re: Create a data set by grabbing the first <insert number> records from a larger one

[ Edited ]
Posted in reply to Astounding

Yes @Astounding you are correct and I didn't intend to suggest that VIEW was faster, only that for this example it won't make much difference. 

 

options Compress=yes

NOTE: Elapsed time VIEW:             00:01:04
NOTE: Elapsed time DATA:             00:00:45

options compress=no

NOTE: Elapsed time VIEW: 00:01:13
NOTE: Elapsed time DATA: 00:00:15

 

This test program attempts to remove all but reading from the test.

 

 

data example;
   array a[100] (1:100);
   do _n_ = 1 to 1e5;
      output;
      end;
   run;
data testv / view=testv;
   set example(obs=50000);
   run;
data testd;
   set example(obs=50000);
   run;

%macro test(n);
   %local _begin_ i;
   %let _begin_ = %sysfunc(datetime());
   options notes=0;
   %do i = 1 %to &n;
      data _null_;
         set testv;
         run; 
      %end;      
   options notes=1;
   %put NOTE: Elapsed time VIEW: %sysfunc(intck(seconds,&_begin_,%sysfunc(datetime())),TOD20);   
   %let _begin_ = %sysfunc(datetime());
   options notes=0;
   %do i = 1 %to &n;
      data _null_;
         set testd;
         run; 
      %end;      
   options notes=1;
   %put NOTE: Elapsed time DATA: %sysfunc(intck(seconds,&_begin_,%sysfunc(datetime())),TOD20);   
   %mend;
%test(250);

 

Regular Contributor
Posts: 216

Re: Create a data set by grabbing the first <insert number> records from a larger one

Posted in reply to Astounding

Yes Astounding, I actually tried this method after and it works fine as well. Going forward, this is probably what I would use.

 

Paul

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 8 replies
  • 382 views
  • 4 likes
  • 4 in conversation