BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mediaeval
Calcite | Level 5

I have a program that reads regularly from this table. The time taken to read data from it tends to vary, but it should only take a few seconds. When I proc print it to the screen, it's 7,000 observations or so. Proc printing it takes 55 seconds to run. Any proc or data step that accesses it takes 55 seconds or so on average, and just now it took 2½ minutes. Any idea why this might be? The PC is a slow one, but even 1GB of RAM and a 2.2GhZ processor should have no problems with this. There are other processes running in the background, but the only place in the program that hangs is where it accesses this table. I can't access my C drive on Windows Explorer, where the file is, so I can't tell if my C drive is short of space. 20434  proc print data = sashelp.vcolumn; 20435      where libname = 'WORK' and memtype = 'DATA' and substr(name,1,3) = 'VAR'; 20436  run; NOTE: No observations were selected from data set SASHELP.VCOLUMN. NOTE: There were 0 observations read from the data set SASHELP.VCOLUMN.       WHERE (libname='WORK') and (memtype='DATA') and (SUBSTR(name, 1, 3)='VAR'); NOTE: PROCEDURE PRINT used (Total process time):       real time          2:30.42       cpu time            7.56 seconds

1 ACCEPTED SOLUTION

Accepted Solutions
Quentin
Super User

There are lots of ways to make a CSV file, but using your current macro as a starting point, below changes to use PROC CONTENTS to get the var list from the dataset, and PROC SQL to make a macro variable holding that list:

options mprint;

%Macro CreateRPTFile(OutputFile, Dataset, RecLength, Library=work);
%local varlist;

Filename Fname "&OutputFile"; 

proc contents data=&library..&dataset out=__varlist (keep=name) noprint;
run;

proc sql noprint;
  select distinct name into : varlist  separated by ',' 
  from __varlist 
  ;
quit;

**This datastep outputs all of the observations in the specified dataset to the external file; 
Data _Null_; 
  File Fname DLM = "," LRECL = &RecLength; 
  Set &Library..&Dataset;   
  If _n_ = 1 Then Put "!,&varlist";
  put "*," (_all_) (:);
run;

proc datasets library=work memtype=data nolist;
  delete __varlist;
run;
%Mend;


data test;
  array x{5};
  do id=1 to 5;
    do i=1 to dim(x);
      x{i}=ranuni(0);
    end;
    output;
  end;
  drop i;
run;

%CreateRptFile(d:/junk/test2.txt,test,100)

HTH,

--Q.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.

View solution in original post

20 REPLIES 20
RW9
Diamond | Level 26 RW9
Diamond | Level 26

The SASHELP.VCOLUMN is actually an SQL view, so you need to factor that in.  If you use it regularly then datastep to another area restricting on libname ne sashelp and then reference that dataset, save re-running each time.

Astounding
PROC Star

A couple of alternative approaches that might run faster ...

Try using PROC SQL on dictionary.columns (instead of using sashelp.vcolumn).

Try running PROC CONTENTS using the options NOPRINT and OUT= to capture the contents of a particular SAS data set.  Also note that you can capture information about all data sets in a library by specifying DATA=libname._all_.

jakarman
Barite | Level 11

Accessing the sashelp.vmember or dictionary.columns will result at runtime:

- query-ing of all active allocates libraries for existing tables

- for each table of that list query-ing on each table for the existing columns and their attributes.

All this queries are a lot of small pieces of information the system is not tuned and designed for. This same kind of performance burden can be seen opening a lot files with small data to the same size of data collected to be in some bigger files. There is a lot overhead accessing headers/metadata.

When a RDBMS is involved (libname) or a RLS (sas/connect) it go everytime round the circle as the information is not seen as a static table.

It is good example of very easy way of getting of information can cause expected bad performance.   
Copying the sashelp.vcolums first to a work.tabel and than go for selecting can help improving your performance without losing the ease on usage.

Advice: only once using the sashelp.vcloumn further processing on a static copy.

---->-- ja karman --<-----
mediaeval
Calcite | Level 5

Here is how I am using sashelp.vcolumn. I'm unsure how to replace the references to it, such that it will work. Also, can you guys see the following code properly? It's all run together in my browser. I'm using Internet Explorer 7.

%Macro CreateRPTFile(OutputFile, Dataset, RecLength, Library=work); **This data step writes the variable names contained in the SAS dataset that is specified in   the &Dataset macro variable, to the file specified in the OutputFile parameter; Filename Fname "&OutputFile"; Data _Null_; File Fname DLM = "," LRECL = &RecLength; Set sashelp.vcolumn; If _n_ = 1 Then Put "!," @; **For libname and memname to work, the names that feed into them must be in uppercase ; Where libname = Put("&Library",$upcase50.) and memname = Put("&Dataset",$upcase50.); Put name @; Run; **This datastep outputs all of the observations in the specified dataset to the external file; Data _Null_; File Fname DLM = "," LRECL = &RecLength Mod; Set &Library..&Dataset;     put "*," (_all_) (:); run; %Mend;

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Sorrry, I find it very difficult to read that code.  Do you just want to print a dataset to a text file with a delimiter?  Try proc export?  If not maybe clarify what you want out as I don't see what you are trying to achieve, is it perhaps to check the existence before outputting - then use the exist() function.

Tom
Super User Tom
Super User

Here is your problem.

Where libname = Put("&Library",$upcase50.)

SAS will not be able to optimize this since it involves the use of a run-time function call to the PUT() function.  So it will need to attempt to open every table in every active library.

Try changing it to:

Where libname = %upcase("&Library")

Now to regular SAS the where clause will contain a constant and SAS should be able to limit itself to just querying the specified library.



mediaeval
Calcite | Level 5

Tom, I changed my code to use the %ucase function but the times doubled, approximately. This probably isn't anything to do with the change of function, it's probably other factors. When I logged on this morning, a proc print statement on sashelp.vcolumn took 3:45.

Quentin
Super User

There are lots of ways to make a CSV file, but using your current macro as a starting point, below changes to use PROC CONTENTS to get the var list from the dataset, and PROC SQL to make a macro variable holding that list:

options mprint;

%Macro CreateRPTFile(OutputFile, Dataset, RecLength, Library=work);
%local varlist;

Filename Fname "&OutputFile"; 

proc contents data=&library..&dataset out=__varlist (keep=name) noprint;
run;

proc sql noprint;
  select distinct name into : varlist  separated by ',' 
  from __varlist 
  ;
quit;

**This datastep outputs all of the observations in the specified dataset to the external file; 
Data _Null_; 
  File Fname DLM = "," LRECL = &RecLength; 
  Set &Library..&Dataset;   
  If _n_ = 1 Then Put "!,&varlist";
  put "*," (_all_) (:);
run;

proc datasets library=work memtype=data nolist;
  delete __varlist;
run;
%Mend;


data test;
  array x{5};
  do id=1 to 5;
    do i=1 to dim(x);
      x{i}=ranuni(0);
    end;
    output;
  end;
  drop i;
run;

%CreateRptFile(d:/junk/test2.txt,test,100)

HTH,

--Q.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
mediaeval
Calcite | Level 5

Many thanks! This runs much more quickly. However, the variables are written to the output text file in alphabetical order. Is there an option on proc contents that will write the variables to __varlist in the order in which they were created in the input dataset?

Tom
Super User Tom
Super User

ORDER BY VARNUM

proc contents data=&library..&dataset out=__varlist (keep=name varnum) noprint;

run;

proc sql noprint;

  select distinct name into : varlist  separated by ','

  from __varlist

  order by varnum

  ;

quit;

Quentin
Super User

You can add varnum to the KEEP= option when PROC CONTENTS outputs the dataset.

Then add ORDER BY VARNUM to the SQL step.

You can also remove DISTINCT from the SQL step, since you can't have two vars with same name it isn't needed.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
Quentin
Super User

Agree with Astounding's suggestion that PROC CONTENTS is often faster than hitting sashelp.vcolumn (or dictionary.columns).

Also, are you sure there are only 7,000 obs in sashelp.vcolumn?  When I start PC SAS, I have more records than that, just from MAPS libraries and others.   Remember, it has information on the variables for every dataset in every library.... it gets big quick.  In BI setting with pre-assigned libraries, it can be enormous.

Another point is that using the SUBSTR() function on the where statement may mean that SAS can't use indexes to find the records more quickly.

The Boston Area SAS Users Group is hosting free webinars!
Next webinar will be in January 2025. Until then, check out our archives: https://www.basug.org/videos. And be sure to subscribe to our our email list.
mediaeval
Calcite | Level 5

Quentin, on login this morning, I'm seeing 5,200 observations when I run proc print on sashelp.vcolumn. Yesterday evening I saw something over 7,000. Running this morning's proc print took 3:45!

data_null__
Jade | Level 19

Do you see CEDA messages? 

15379 - Cross-Environment Data Access (CEDA) utilization message

NOTE: Data file libref.member.DATA is in a format native to

  another host or the file encoding does not match the

  session encoding. Cross Environment Data Access will

  be used, which may require additional CPU resources

  and reduce performance.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 20 replies
  • 4310 views
  • 8 likes
  • 7 in conversation