Hi all SAS Users,
These two days I deal with a 128 GB file, and I fell super hopeless because just a very simple code, it takes around 30 minutes and still be there (around 12 million obs in this file). So now I decide to cut it into many small pieces and deal with each of them.
I have three questions here:
1. How to check with how many observations, my computer can perform reasonably, or just assign a number and test until we feel the computer being slow gradually?
2. How to subsample from observation N1 to N2.
For example, the code to retrieve the first 10000 obs is as below:
data want;
set have (obs=10000);
run;
So, what if I want a new dataset from obs 10001 to 20000, what code I should perform?
3. A further question of my curiosity only, I have 11 million observations in the file have, is there any way to split this whole dataset to many subsamples that each subsample contains 1 million observation?
Warmest regards.
@Phil_NZ wrote:
Hi all SAS Users,
....
2. How to subsample from observation N1 to N2.
For example, the code to retrieve the first 10000 obs is as below:
data want; set have (obs=10000); run;
So, what if I want a new dataset from obs 10001 to 20000, what code I should perform?
data want;
set have (firstobs=10001 obs=20000);
run;
3. A further question of my curiosity only, I have 11 million observations in the file have, is there any way to split this whole dataset to many subsamples that each subsample contains 1 million observation?
You could define 11 data set VIEWS. They take no space but would (presumably) save you lots of typing when analyzing 1,000,000 obs at a time.
data have1 /view=have1;
set have (obs=1000000);
run;
data have2 /view=have2;
set have (firstobs=1000001 obs=2000000);
run;
.....
data have 11/view=have11;
set have (firstobs=10000001);
run;
Then you can selectively run you analyses on HAVE1, HAVE2, etc.
@Phil_NZ wrote:
Hi all SAS Users,
....
2. How to subsample from observation N1 to N2.
For example, the code to retrieve the first 10000 obs is as below:
data want; set have (obs=10000); run;
So, what if I want a new dataset from obs 10001 to 20000, what code I should perform?
data want;
set have (firstobs=10001 obs=20000);
run;
3. A further question of my curiosity only, I have 11 million observations in the file have, is there any way to split this whole dataset to many subsamples that each subsample contains 1 million observation?
You could define 11 data set VIEWS. They take no space but would (presumably) save you lots of typing when analyzing 1,000,000 obs at a time.
data have1 /view=have1;
set have (obs=1000000);
run;
data have2 /view=have2;
set have (firstobs=1000001 obs=2000000);
run;
.....
data have 11/view=have11;
set have (firstobs=10000001);
run;
Then you can selectively run you analyses on HAVE1, HAVE2, etc.
1) No idea, your hardware, your network limitations
2) See if this gives you an idea:
data junk1; set sashelp.class (firstobs=1 obs=5); run; data junk2; set sashelp.class (firstobs=6 obs=10); run; data junk3; set sashelp.class (firstobs=11 obs=15); run; /*or*/ data junk1 junk2 junk3 ; if 1 le _n_ le 5 then output junk1; else if 6 le _n_ le 10 then output junk2; else if 11 le _n_ le 15 then output junk3; run;
Not saying this is a good idea.
It would make much more sense to subset data by some characteristic.
3. this is the same question as 2 just phrased differently.
Do you need every single variable in the data? Reducing variables may make more sense as then all the values of the specified variables are available. Select the variables needed to completely identify the records and then the ones likely needed for specific analysis.
It may be easier when the inevitable question about "what if" comes up to add in a couple of additional variables for an analysis than having to find them from multiple data sets.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.