good afternoon
i have a vast data-set which i want to process, the process can only be done when isolating subset of the entire data-set.
as you can see from the list below i want to be able to:
choose rows 2-6
rename the subset with an increment
run my processing %mymacro on this subset only
go back to the main data-set
choose rows 7-14
...
...
%mymacro
...
choose rows 15-22
...
...
%mymacro
...
choose rows 23-30
and so on until the macro has processed every subset up to 199 to EOF (last _n_ is not in the list for some reason).
my process macro is complete
i need to programatically create code for firstobs lastobs to create subset of the data that needs processing.
so if a or macro was to go through the numbers above it would generate something like
data D000100_1a;
set D000100 (firstobs=2 obs=5);
run;
data D000100_1b;
set D000100 (firstobs=7 obs=8);
run;
data D000100_1c;
set D000100 (firstobs=15 obs=8);
run;
You would likely be way ahead in this project if you add a variable to identify the groups and then use BY group processing.
Example of creating output for each level of a variable using by group processing:
proc sort data=sashelp.class out=work.class; by age; run; proc means data=work.class max min mean; by age; var height weight; run; proc print data=work.class; by age; var name sex; run;
And what happens to your "row"=1?
@teelov wrote:
you answer has no relevance to my question. and telling me i would be "further ahead" is slightly insulting.
I am sorry that you feel that way but your original requirement included:
the process can only be done when isolating subset of the entire
Which is exactly what BY group processing does.
Or use of the data set option WHERE with a group variable equal to the desired group (or groups).
Or use of a WHERE statement in the very many procedures that support it.
None of these three approaches require creating multiple data sets, which you now have to reference explicitly and maintain.
Plus by default BY group processing tells you which group is being processed, sets values that can be used to do such things as enhance Title statements, name output tabs in spreadsheets. If you want or need any of that functionality you have to add additional coding.
Another issue is that if you are defining you output sets based on arbitrary observations your process is extremely fragile if you have to repeat it because you have to redefine every single start/end pair.
Consider what happens if your project manager comes up and says, by the way these three records were missed previously and need to be incorporated. You will likely have to do a lot of work to get them in the correct group. If a group identifier variable was in your data set you would only have to add the group identifier for the three records, append to the data , sort for by group processing and go.
Consider you project manager asking to repeat the analysis but combine some arbitrary collections of records as different groups.
@teelov wrote:
you answer has no relevance to my question. and telling me i would be "further ahead" is slightly insulting.
Why does it have no relevance to your question? It's a method to do iterative processing that is pretty much the standard in SAS.
If you want someone to code a specific answer, that's a consultants job, not the job for public user forum. The purpose of the forum is to help answer questions, but it's primarily volunteers answering the questions here who owe you absolutely nothing.
Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.
Register today!Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.