Hi all SAS Users,
Today, when reading a suggestion from @ballardw about TAGSORT in dealing with large dataset, I touched a document.
When reading this document, there is one point quite strange to me
It is quite strange to me because the view is just an input or an instruction, not a dataset, how come we sort it afterward.
My understanding is "We can not sort a VIEW".
Could you please explain it to me?
Warm regards.
There's no need for extra "Save" or "Save as" actions. Just use the Windows Explorer to navigate to the WORK library and then open the .sas7bvew file with Notepad. To find out the physical path of the WORK library you can use:
%put %sysfunc(pathname(work));
which will write the path to the log. In my default local SAS EG installation (which I normally don't use, therefore it's not customized) the path contains 11 (eleven!) subfolder names.
Hi @Phil_NZ,
@Phil_NZ wrote:
It is quite strange to me because the view is just an input or an instruction, not a dataset, how come we sort it afterward.
My understanding is "We can not sort a VIEW".
It's not the view itself that is sorted, but the dataset created by executing the instructions stored in the view. See the two blocks of notes in the log from the PROC SORT step: The first refers to the view (TEMP) and the second to the dataset (WIDE) used by the view.
In the example it is mandatory to use the OUT= option of the PROC SORT statement with a different dataset name because the resulting dataset must not have the same name as the view if it is to be stored in the same library.
Thank you for your explanation and insightful suggestion.
I am still a little bit confused here.
It's not the view itself that is sorted, but the dataset created by executing the instructions stored in the view
Could you please tell me what is the dataset created by executing the instructions stored in the view? From my standpoint, in the first block of the dataset, view is an instruction created by reading the dataset wide (Shortly speaking, view is an instruction of wide with less variables). And in the second block of code, SAS will sort the instruction temp then output to another dataset narrow.
I think I may fall into a fallacy, but from my understanding, temp here is an instruction, and the second block of code is working on such instruction and it is unreasonable.
Warmest regards.
@Phil_NZ wrote:
From my standpoint, in the first block of the dataset, view is an instruction created by reading the dataset wide ...
I'd rather say: The DATA step in this example creates a set of instructions (stored in a file temp.sas7bvew) which can be used later to read the observations of a dataset named WIDE, restricted to three variables -- just, three and vars -- of the types (i.e., numeric or character) found in dataset WIDE at the time when the view is created.
@Phil_NZ wrote:
And in the second block of code, SAS will sort the instruction temp then output to another dataset narrow.
Not the instructions will be sorted. Normally, the SORT procedure operates on an existing physical dataset specified in the DATA= option of the PROC SORT statement (or the default dataset _LAST_). Here, however, the observations to be sorted (constituting the "dataset" I referred to in the sentence you quoted) must first be created, as they don't exist yet. This creation is accomplished by executing the instructions stored in the view, using the dataset WIDE that exists at the time when PROC SORT executes (which might be a different dataset than the WIDE dataset mentioned above, but in the example it's the same). So, the PROC SORT step will take longer than it normally would because the dataset to be sorted isn't ready yet. But this "non-existence" can obviously save a lot of disk space because the .sas7bvew file is small regardless of the size of dataset WIDE. PROC SORT processes the (partial) observations obtained from dataset WIDE via the view in the same way as it would process similar observations read from an existing physical dataset.
Thank you for your dedicated explanation. There are two points I am curious here:
1. I think I can totally apply the mechanism you just say to this "wrong" code:
data iii/view=iii;
set sashelp.class (keep=Height Sex Name);
run;
proc sort data=iii tagsort;
by Sex;
run;
And the log is
39 proc sort data=iii tagsort;
40 by Sex;
41 run;
ERROR: Cannot sort in place.
ERROR: Unable to create WORK.III.DATA because WORK.III.VIEW already exists.
I am wondering if I fell into any fallacy here.
2. You said that
@FreelanceReinh wrote:The DATA step in this example creates a set of instructions (stored in a file temp.sas7bvew)
I am not sure if I get you wrongly, but when I run the code
data iii/view=iii;
set sashelp.class (keep=Height Sex Name);
run;
And then I export the view iii (Right click => Share =>Export), I get a file with the tail sas7bdat rather than sas7bvew, could you please help me about this difference?
Warm regards.
A view points to data: It provides data, but does not contain data.
1. The view is used as input. You need an output. proc sort cannot create a sorted view, only a data set. So your code cannot possibly work. Also note that were this possible, it would make no sense to use the same view name: the sorted view points back to the unsorted view. But the sorted view would be overwritten by the sorted view, so the sorted view would point back to itself.
2. When you export, the data is exported.
You seem keen to learn about views. Views have a very useful role, but have a sizeable performance overhead. Consider:
data T; do I=1 to 1e8; output; end; run; * Create table ;
data _V/view=_V; set T; run; * Create view ;
data _null_; set T; run; * Read table: 5 seconds ;
data _null_; set _V; run; * Read view: 20 seconds ;
Just keep that in mind: Views have their place, but views are costly to use.
Hi @ChrisNZ and @FreelanceReinh
I think I know the reason, you are right @ChrisNZ , let me try to explain my idea. I try to run this code and this time it successfully runs after adding an output.
data hhh/view=hhh;
set sashelp.class (keep=Height Sex Name);
run;
proc sort data=hhh tagsort out=jjj;/*I add the out dataset here*/
by Sex;
run;
So, I rethink about what you said. So, the proc sort will not sort on view (hhh) but it will write the sorted value from hhh on an empty dataset jjj. So, it means, it cannot sort on a view but can write down the data calculation or sorting from view to an output. I hope that I understand your explanation correctly.
Hi @ChrisNZ , I am keen on learning further about SAS (not only VIEW) because I am falling in love with it. Thank you for letting me know about the time-consuming of VIEW, it is the first time I notice about that. The more I learn about it, the more I see its beauty. And the more I discuss with all of you here, the more I grow up not only in coding but also in my characteristics.
Regarding views, I keen on learning it because it is a very nice way to deal with a large dataset. And yeah, there is no free lunch, so if it cost a little space then time-consuming makes sense. The same story happening when we do proc sort with and without TAGSORT option.
Warmest regards and have a good week,
Phil.
Thanks to @ChrisNZ for continuing the discussion during the European night hours.
@Phil_NZ: Normally, a PROC SORT statement without an OUT= option overwrites the input dataset with the sorted dataset (having the same name). However, if a view (.sas7bvew file) is specified as input (in the DATA= option), creating a sorted output dataset (.sas7bdat file) with the same name in the same library would cause an ambiguity as to which of the two files -- the view or the dataset -- will be meant when that name is used in a subsequent DATA or PROC step. This is what the error message
ERROR: Unable to create WORK.III.DATA because WORK.III.VIEW already exists.
points out. Specifying a different dataset name in the OUT= option avoids this problem easily.
To better understand the huge difference between the view and the corresponding dataset (e.g., obtained by "exporting" the view), you can open both files (using your small example involving SASHELP.CLASS) with a text editor such as Notepad. Both are binary files, so be prepared to see a lot of "gibberish." But interspersed between all those strange characters you will see several snippets of human-readable text: In the .sas7bvew file these include metadata like variable names ("Name", "Sex", "Height") and most notably SAS code ("data iii/view=iii;", "set sashelp.class", ...), but NEVER any data (like "Alfred", "M", "Barbara", "F"). In the .sas7bdat file, however, you will see these character data values. (The numeric values, here: from variable Height, are coded and therefore less easily recognized, but they are definitely in there.) Of course, metadata is contained in the dataset as well, but NO SAS code (e.g., a SET statement).
Thank you for your dedicated explanation, now I get the point.
Regarding the sas7bvew and sas7bdat, I saved them to my desktop, opening them but can't see the difference as you documented, could you please help me to have a look please?
data jjj;
set sashelp.class (keep=Height Sex Name);
run;
data hhh/view=hhh;
set sashelp.class (keep=Height Sex Name);
run;
But when I open them by Notepad, there is nothing like what you mentioned
and
Warmest regards.
@Phil_NZ - If your datasets are compressed then very little will be readable I expect.
Hi @SASKiwi and @FreelanceReinh
Thank you for your suggestion, after setting compress=no, the result seems not align with what I understand from your suggestion @FreelanceReinh , I am wondering if I fall into any fallacy.
data mmm (COMPRESS=NO);
set sashelp.class (keep=Height Sex Name);
run;
data lll /view=lll;
set sashelp.class (keep=Height Sex Name);
run;
And when I open the file "lll", I have something that from your comment, it should not happen:
Warm regards,
@Phil_NZ wrote:
Thank you for your dedicated explanation, now I get the point.
You're welcome. I'm glad to hear that.
@Phil_NZ wrote:
Regarding the sas7bvew and sas7bdat, I saved them to my desktop, opening them but can't see the difference as you documented, could you please help me to have a look please?
The jjj.sas7bdat looks good, but you need to scroll down to see the character values ("Alfred", etc.) and still further down to see the metadata ("Name", etc.). There's a lot of white space (consisting of '00'x characters) in the file.
I think you haven't shown the hhh.sas7bvew file, though. Both the icon (obviously the same as that of jjj.sas7bdat; the icon of the view should contain a magnifying glass) and the text "DATA" in your screenshot indicate that you opened a .sas7bdat file instead. The real hhh.sas7bvew file will have the text "VIEW" in place of "DATA".
Edit:
As to the lll file: Again, you are looking at a .sas7bdat file.
Thank you for your explanation, can I ask how can I save a file under sas7bvew then? From the drop-down window, I can only see sas7bdat.
Warm regards.
There's no need for extra "Save" or "Save as" actions. Just use the Windows Explorer to navigate to the WORK library and then open the .sas7bvew file with Notepad. To find out the physical path of the WORK library you can use:
%put %sysfunc(pathname(work));
which will write the path to the log. In my default local SAS EG installation (which I normally don't use, therefore it's not customized) the path contains 11 (eleven!) subfolder names.
Thank you very much @FreelanceReinh
I am wondering how come you remember exactly each code like that, I am overwhelmed by how fast you reply to me, can I ask about the experience though? Because it takes me time to remember the codes .....
Respect and warm regards.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.