About ScottBass

ScottBass · ‎07-29-2012

Hi All, I thought this was an interesting problem, and a chance for me to "play" with some techniques in SAS I don't normally use. Apologies in advance for the length of this post. General Concepts: Use of sashelp.vcolumns: In my environment, we pre-allocate about 30 SAS libraries with hundreds of tables and tens of thousands of variables. Some of those libraries are to Oracle databases. After the incredibly painful process (~ 10 minutes) of retrieving 4 variables, the code generation finally began! However, even when I cleared all libnames (libname _all_ clear), sashelp.vcolumn was still really slow. I recommend using proc contents instead of sashelp.vcolumns. Compare the three approaches below. I also switched to other datasets (sashelp), because we also want to compare performance across different approaches, and the example data is just too small to evaluate performance. * using dictionary.columns ; proc sql; create table vars as select libname, memname, name from dictionary.columns where catx(".",libname,memname) in ("SASHELP.CLASS","SASHELP.CARS","WORK.ZIPCODE") order by libname, memname, varnum ; quit; * using sashelp.vcolumn ; data vars; set sashelp.vcolumn (keep=libname memname name); where catx(".",libname,memname) in ("SASHELP.CLASS","SASHELP.CARS","WORK.ZIPCODE"); run; * using proc contents ; %macro get_variables(data); proc contents data=&data out=temp (keep=libname memname name) noprint; run; proc append base=vars data=temp; run; %mend; proc datasets lib=work nowarn nolist; delete vars; quit; %get_variables(sashelp.class); %get_variables(sashelp.cars); %get_variables(work.zipcode); The actual performance will vary based on your environment, but in general I find the best performance is 1) proc contents, 2) dictionary.columns, and 3) sashelp.vcolumn, in that order, especially if you just need the columns from a single dataset. Try each approach in your environment and see which works best for you. call execute vs. dynamic code generation: There is nothing intrinsically wrong with call execute. However, I generally prefer dynamic code generation to a temporary file, then %including that temporary file. This way, I can easily debug my generated code using "fslist". This assumes code generation via DMS, otherwise use a data _null_ step to echo the code to the log. I also get more control over the code formatting. This has nothing to do with code execution, but can be helpful with debugging if your generated code block is large. Here is an example: * dynamically create code ; filename code temp; data _null_; set vars end=eof; file code; if _n_ eq 1 then do; call symputx("firstvar",name,"G"); put @1 "proc sql;"; put @4 "create table temp as"; put @7 "select"; end; put @10 memname $quote. @80 "as Table,"; put @10 name $quote. @80 "as Name,"; put @10 "nmiss(" name +(-1) ")" @80 "as Missing,"; put @10 "sum(case when cats(" name +(-1) ")='N/A' then 1 else 0 end)" @80 "as Not_Applicable,"; put @10 "sum(case when cat(" name +(-1) ") not in ('N/A', ' ') then 1 else 0 end)" @80 "as Not_Missing,"; if eof then do; put @10 "0" @80 "as dummy"; put @7 "from"; put @10 libname +(-1) "." memname; put @4 ";"; put @1 "quit;"; end; run; * check out the generated code ; dm "fslist code"; * execute the code ; %include code; Note: You don’t need the “firstname” macro variable for the proc transpose (see later code below), but the above would be a way to set it to the first variable in the source data. Missing option: The character representation of a missing numeric value is controlled by the missing option, which by default is ".". If you want to make absolutely sure that your code works correctly when checking for missing numeric values, explicitly set the missing option. You can also simplify the code slightly if you explicitly set it to blank: * reset character used for missing numeric data ; %let missing=%sysfunc(getoption(missing)); options missing=" "; /* Your code. Both character and numeric missing are now both “ “ */ * restore character used for missing numeric data ; options missing="&missing"; Increasing disk I/O performance: This is more of an aside. I did test this approach, but it did not perform better than using views. When I need high performance disk I/O, I’ve been using this approach lately: libname spdework spde "%sysfunc(pathname(work))" temp=yes; Then, use a two-level name of spdework.<your work dataset> instead of <work>. This can yield really good performance, especially for a large work dataset that is used repeatedly in downstream code. You can also set the user= option to use a single level name writing to spdework: options user=spdework; data foo;x=1;run; Use views when appropriate: Use a data step or SQL view when you can to reduce disk I/O when appropriate. Determining “when appropriate” is beyond the scope of this post, but see code using a data step view below. Problem Analysis: If we review the problem, what we want to do is group the data into "buckets" (missing, not applicable, not missing), for each variable, then get frequency counts for those "buckets". Further analysis of the previous answers shows that we're generating our results one column at a time, over the entire dataset – each proc sql/select/union all code block is processing the entire dataset, one variable at a time. The original poster indicated that his/her "real" data is 2M + records. It may also have many more variables than the 4 in the sample datasets. So, the total data processed is # of records * # of variables. So, for example, 2M records * 10 variables = processing 20M records. In the "real" problem, performance does matter. I also considered "Is there a way to do this without pre-processing the data (sashelp.vcolumns or proc contents) and dynamic code generation"? Also “Is there a way to process the data in one pass”? I created a number of code versions - I’ll only include a few of the approaches (I won’t post the disasters lol). Some comments: A data step view performed better than creating a SPDE work dataset. Using call vnext and vvaluex, I did not have to pre-process the data. I found it easier to process the data sets one at a time and use proc append, esp. since I didn’t use SQL. In the proc summary approach, I still process # of records * # of variables, so the data does “bloat”. Luckily, the logic checks are mutually exclusive – a column can ONLY be one of missing, not applicable, or not missing. Otherwise the dataset (data step view) would bloat even more. In the first hash object approach, I only process the data once. But, it performed the same as the “bloated” data step view. I conjecture that the overhead of looking up and replacing the counts for every data value created enough overhead that it performed no better than the proc summary approach. But it was fun coding this approach :-). I also found a documentation hit that were really interesting: http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a002585310.htm (search on “Maintaining Key Summaries”), and It took me a while to understand this doc – here is my understanding: when you declare a suminc variable, the hash object maintains an increment counter on all its keys. When a key is found, the value of the suminc variable is added to the current internal value maintained by the hash object for that key. For the first “hit” on the hash object key, you need to use add() instead of find(). At the end of the processing, use the sum() method to retrieve the increment counter for a given key. Since we’re just interested in frequency counts, the value of the suminc variable is always 1. See an example below Here are my code examples. Hopefully they are somewhat self explanatory, otherwise post a follow up question. Code Prologue (create example data used for all programs): options mprint nomlogic; data table1; input Name $ Age; cards; John 45 N/A 30 . 15 Carl 25 ; run; data table2; input Color $ Height; cards; Blue 110 N/A 120 . 100 Red . ; run; * create a large dataset to test performance ; data work.zipcode; set sashelp.zipcode sashelp.zipcode sashelp.zipcode sashelp.zipcode sashelp.zipcode ; run; %bench macro (macro I use when I’m benchmarking performance): Note: comment out call to %parmv /*===================================================================== Program Name : bench.sas Purpose : Measures elapsed time between successive invocations. SAS Version : SAS 8.2 Input Data : N/A Output Data : N/A Macros Called : parmv Originally Written by : Scott Bass Date : 24APR2006 Program Version # : 1.0 ======================================================================= Modification History : Original version =====================================================================*/ /*--------------------------------------------------------------------- Usage: * Start benchmarking. * Both invocations are identical as long as start ; * has not been previously invoked ; %bench; %bench(start); data _null_; rc=sleep(3); run; * Get elapsed time, should be approx. 3 seconds elapsed, 3 seconds total ; %bench(elapsed); data _null_; rc=sleep(7); run; * Get another elapsed time, should be approx. 7 seconds elapsed, 10 seconds total ; %bench; * elapsed parm not required since start was already called ; data _null_; rc=sleep(2); run; * End benchmarking, should be approx. 2 seconds elapsed, 12 seconds total ; * Must be called after start. Resets benchmarking. ; %bench(end); ----------------------------------------------------------------------- Notes: If %bench has never been invoked, calling %bench without parameters starts benchmarking. You may also explicity specify the start parameter. Explicitly specifying the start parameter resets benchmarking, although normally the end parameter would be used. If %bench has been previously invoked with the start parameter, calling %bench without parameters prints the elapsed time. You may also explicity specify the elapsed parameter. To end benchmarking and reset the start time, specify the end parameter. Only the elapsed or end parameters (or equivalent processing) print time measurements to the log. The start parameter does not print anything to the log. The only parameter that needs to be explicitly specified is end. Otherwise the macro should do the right thing, either starting benchmarking or printing elapsed times. Benchmarking a time period greater than 24 hours is "unpredictable". ---------------------------------------------------------------------*/ %macro bench /*--------------------------------------------------------------------- Measures elapsed time between successive invocations. ---------------------------------------------------------------------*/ (PARM /* Benchmarking parameter (Opt). */ /* If not specified: */ /* If first invocation, start benchmarking. */ /* If subsequent invocation, print elapsed time. */ /* Valid values are START ELAPSED END. */ ); %local macro parmerr time_elapsed time_total time_elapsed_str time_total_str h m s; %let macro = &sysmacroname; %* check input parameters ; %parmv(PARM, _req=0,_words=0,_case=U,_val=START ELAPSED END) %if (&parmerr) %then %goto quit; %* nested macro for printing ; %macro print(_parm); %let time_elapsed = %sysevalf(%sysfunc(datetime()) - &_elapsed); %let time_total = %sysevalf(%sysfunc(datetime()) - &_start); %let h = %sysfunc(hour(&time_elapsed),z2.); %let m = %sysfunc(minute(&time_elapsed),z2.); %let s = %sysfunc(second(&time_elapsed),z2.); %let time_elapsed_str = &h hours, &m minutes, &s seconds; %let h = %sysfunc(hour(&time_total),z2.); %let m = %sysfunc(minute(&time_total),z2.); %let s = %sysfunc(second(&time_total),z2.); %let time_total_str = &h hours, &m minutes, &s seconds; %put; %put Benchmark &_parm:; %put; %put Elapsed seconds = &time_elapsed_str &time_elapsed; %put Total seconds = &time_total_str &time_total; %put; %mend; %* declare global variables ; %global _start _elapsed; %if (&parm eq START) %then %do; %let _start = %sysfunc(datetime()); %let _elapsed = &_start; %end; %else %if (&parm eq ELAPSED) %then %do; %if (&_start eq ) %then %do; %put ERROR: Benchmarking must be started before elapsed time can be printed.; %goto quit; %end; %else %do; %print(ELAPSED) %let _elapsed = %sysfunc(datetime()); %end; %end; %else %if (&parm eq END) %then %do; %if (&_start eq ) %then %do; %put ERROR: Benchmarking must be started before elapsed time can be printed.; %goto quit; %end; %else %do; %print(END) %* reset benchmarking ; %symdel _start _elapsed / nowarn; %end; %end; %else %if (&parm eq ) %then %do; %* derive proper parm then recursively call this macro ; %if (&_start eq ) %then %do; %bench(start) %end; %else %do; %bench(elapsed) %end; %end; %quit: %* if (&parmerr) %then %abort; %mend; /******* END OF FILE *******/ Data Step View and PROC SUMMARY approach: %macro get_counts(data); * reset character used for missing numeric data ; %let missing=%sysfunc(getoption(missing)); options missing=" "; * create additional grouping variables ; data vgrouped / view=vgrouped; set &data indsname=dsn; * set a dummy variable as an end of variable list marker ; retain dummy ""; drop dummy; * define additional variables ; * varname must be long enough to contain memname_varname ; length libname $8 memname $32 varname $65 cvalue $200 measure $15; * we only need to get the libname and memname once ; if (_n_=1) then do; libname=scan(dsn,1,"."); memname=scan(dsn,2,"."); retain libname memname; end; * spin through all the variables in the dataset, building grouping variables ; * since our checks are mutually exclusive, this will not cause bloating of the dataset ; do while (1); call vnext(varname); if (varname in ("dsn","eof")) then continue; if (varname="dummy") then leave; * get the variable value (character, formatted value) ; cvalue=vvaluex(varname); * build the grouping variable ; select; when (missing(cvalue)) measure="Missing"; when (strip(cvalue)="N/A") measure="Not_Applicable"; when (not missing(cvalue)) measure="Not_Missing"; * this covers all possibilities, so I purposely left out an otherwise statement ; end; * build the new variable name ; varname=catx("_",memname,varname); * output the observation ; output; end; run; * now summarize over each measure to get the frequency counts ; proc summary data=vgrouped nway; class libname memname varname measure; output out=summary (drop=_type_); run; * transpose data ; proc transpose data=summary out=transposed (drop=_name_); by libname memname varname; id measure; var _freq_; run; * set desired PDV order, ensure all variables are present, and replace missing values with zero ; data missing2zero; format libname memname varname; length Missing Not_Applicable Not_Missing 8; set transposed; array miss{*} Missing -- Not_Missing; do i=1 to dim(miss); if miss{i}=. then miss{i}=0; end; drop i; run; * append data ; proc append base=final1 data=missing2zero; run; * restore character used for missing numeric data ; options missing="&missing"; %mend; proc datasets lib=work nolist nowarn; delete final1:; quit; %bench(start) %get_counts(work.table1) %get_counts(work.table2) %get_counts(sashelp.class) %get_counts(sashelp.cars) %get_counts(work.zipcode) %bench(elapsed) * this sort isn't required, but makes it easier to compare the proc print outputs ; proc sort data=final1; by libname memname varname; run; * one last transpose. use whichever dataset you prefer, final or final2 ; proc transpose data=final1 out=final1a (rename=(_name_=Num_Obs)); id varname; run; title; proc print data=final1; run; proc print data=final1a; run; %bench(end) Hash Object, setting and retrieving frequency counts: This approach only processes the data once, but has the overhead of finding and setting the frequency counts for every data value in the source data. The hash object will never get that big: up to three rows (Missing, Not Applicable, Not Missing) per variable. %macro get_counts(data); * reset character used for missing numeric data ; %let missing=%sysfunc(getoption(missing)); options missing=" "; data _null_; set &data indsname=dsn end=eof; * set a dummy variable as an end of variable list marker ; retain dummy ""; * define additional variables ; * varname must be long enough to contain memname_varname ; length libname $8 memname $32 varname $65 cvalue $200 measure $15 count 8; * use a hash object to store summary data ; if (_n_=1) then do; dcl hash sums(hashexp: 16); sums.defineKey( "libname","memname","varname","measure"); sums.defineData("libname","memname","varname","measure","count"); sums.defineDone(); * get libname and memname ; libname=scan(dsn,1,"."); memname=scan(dsn,2,"."); retain libname memname; end; * spin through all the variables in the dataset, building summarization variables ; do while (1); call vnext(varname); if (varname in ("dsn","eof")) then continue; if (varname in ("dummy")) then leave; * get the variable value (character, formatted value) ; cvalue=vvaluex(varname); * set the hash object keys (libname and memname are already set) ; * derived varname ; varname=catx("_",memname,varname); * measure ; select; when(missing(cvalue)) measure="Missing"; when(strip(cvalue)="N/A") measure="Not_Applicable"; when(not missing(cvalue)) measure="Not_Missing"; end; * initialize counter back to 0 ; count=0; * retrieve the current key and increment it ; * if find fails (first time through) count is still 0; rc=sums.find(); * increment the count ; count=count+1; * save the incremented count ; sums.replace(); end; * now output the hash object as a dataset ; if eof then sums.output(dataset: "counts"); run; * append data ; proc append base=final2 data=counts; run; * restore character used for missing numeric data ; options missing="&missing"; %mend; proc datasets lib=work nolist nowarn; delete final2:; quit; %bench(start) %get_counts(work.table1) %get_counts(work.table2) %get_counts(sashelp.class) %get_counts(sashelp.cars) %get_counts(work.zipcode) %bench(elapsed) * this sort isn't required, but makes it easier to compare the proc print outputs ; proc sort data=final2; by libname memname varname; run; * one last transpose. use whichever dataset you prefer, final or final2 ; proc transpose data=final2 out=temp (drop=_name_); by libname memname varname notsorted; id measure; run; * set PDV order, which sets final observation order in final2 ; * could use a view here but it is such a tiny dataset ; data temp; format libname memname varname Missing Not_Applicable Not_Missing; set temp; run; proc transpose data=temp out=temp2 (rename=(_name_=Num_Obs)); id varname; run; * replace missing values with zero ; data final2a; set temp2; array miss{*} _numeric_; do i=1 to dim(miss); if miss{i}=. then miss{i}=0; end; drop i; run; title; proc print data=final2; run; proc print data=final2a; run; %bench(end) Hash Object, using suminc incrementation variable: This approach also only processes the data once. %macro get_counts(data); data _null_; set &data indsname=dsn end=eof; retain dummy ""; length libname $8 memname vname $32 varname $65 measure $20 cvalue $200; if (_n_=1) then do; dcl hash sums(suminc:"count"); sums.defineKey("libname","memname","varname","measure"); sums.defineData("libname","memname","varname","measure","count"); sums.defineDone(); dcl hiter iter("sums"); libname=scan(dsn,1,"."); memname=scan(dsn,2,"."); retain libname memname; end; do while (1); call vnext(vname); if (vname in ("dsn","eof")) then continue; if (vname in ("dummy")) then leave; cvalue=vvaluex(vname); select; when(missing(cvalue)) measure="Missing"; when(strip(cvalue)="N/A") measure="Not_Applicable"; when(not missing(cvalue)) measure="Not_Missing"; end; varname=catx("_",memname,vname); count=1; if (sums.find() ne 0) then sums.add(); end; if eof then do; rc=iter.first(); do while (rc=0); sums.sum(sum: count); sums.replace(); rc=iter.next(); end; sums.output(dataset: "counts"); end; run; proc append base=final3 data=counts; run; %mend; proc datasets lib=work nolist nowarn; delete final3:; quit; %bench(start) %get_counts(work.table1) %get_counts(work.table2) %get_counts(sashelp.class) %get_counts(sashelp.cars) %get_counts(work.zipcode) %bench(elapsed) * this sort IS required, to group the varnames in the next transpose ; proc sort data=final3; by libname memname varname measure; run; * one last transpose. use whichever dataset you prefer, final or final2 ; proc transpose data=final3 out=temp (drop=_name_); by libname memname varname; id measure; run; * set PDV order, which sets final observation order in final2 ; * could use a view here but it is such a tiny dataset ; data temp; format libname memname varname Missing Not_Applicable Not_Missing; set temp; run; proc transpose data=temp out=temp2 (rename=(_name_=Num_Obs)); id varname; run; * replace missing values with zero ; data final3a; set temp2; array miss{*} _numeric_; do i=1 to dim(miss); if miss{i}=. then miss{i}=0; end; drop i; run; title; proc print data=final3; run; proc print data=final3a; run; %bench(end) After running all three approaches, you can compare the “A” datasets. The “not A” datasets are not normalized the same way, so they don’t compare. proc compare base=final1a compare=final2a; run; proc compare base=final1a compare=final3a; run; To print totals, there are a number of ways to do that. I’ll use PROC REPORT: options nocenter; proc report data=final1 nowd; columns libname memname varname missing not_applicable not_missing total; define libname / order; define memname / order width=12; define varname / order width=25; compute total; total=sum(missing.sum,not_applicable.sum,not_missing.sum); endcomp; break after memname / ol summarize skip; quit; proc report data=final1a nowd; rbreak after / ol summarize skip; label num_obs=" "; quit; Finally, after running all three approaches, there wasn’t a big difference in performance between any of the approaches. Based on this, I’d recommend the proc summary approach, since it’s the simplest (of the three approaches I listed). Also, I did not compare the performance with the original sql/union all approach, since 1) the previous posted code wasn’t generic enough to support datasets in multiple libraries and I didn’t feel like fiddling with the code, 2) with enough datasets and variables you’d likely run into the limits of sql, and 3) the original poster said he’d like to split the output into multiple output datasets, which would be easier with the macro/proc append approach. If someone wants to compare the performance with the original sql approach, please post the results. Hope this helps and sorry again for the length... Scott

ScottBass · ‎07-20-2012

Google "findstr", here is one hit: Findstr - Search for strings You could also use Powershell but that would be slightly more complicated. I do like Powershell's support of context, where you can print x number of lines before or after the text match. Google "powershell select-string", here is one hit: Select-String and Grep - Windows PowerShell Blog - Site Home - MSDN Blogs HTH, Scott

ScottBass · ‎07-16-2012

define last / order descending noprint; http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a002473627.htm, scroll to ORDER option.

ScottBass · ‎07-16-2012

/** Not Working with macro variable **/ *************************************************************************************************** %let path1 = C:\work\test\final output; filename DIRLIST pipe 'dir "&path1." /B'; data dirlist ; length buffer $256 ; infile dirlist length=reclen ; input buffer $varying256. reclen ; run ; ================================================================================================ /** Result **/ 1695 %let path1 = C:\work\test\final output; 1696 1697 filename DIRLIST pipe 'dir "&path1." /B'; 1698 data dirlist ; 1699 1700 length buffer $256 ; 1701 1702 infile dirlist length=reclen ; 1703 1704 input buffer $varying256. reclen ; 1705 run ; NOTE: The infile DIRLIST is: Unnamed Pipe Access Device, PROCESS=dir "&path1." /B,RECFM=V,LRECL=256 Stderr output: File Not Found NOTE: 0 records were read from the infile DIRLIST. NOTE: The data set WORK.DIRLIST has 0 observations and 1 variables. NOTE: DATA statement used (Total process time): real time 0.03 seconds cpu time 0.03 seconds Double up on the double quotes: %let path1 = C:\work\test\final output; filename DIRLIST pipe "dir ""&path1"" /B"; I think this is documented in the Concepts Guide somewhere, but it's been a while since I've read it. However, for this scenario, I've gotten good results from DOPEN/DREAD. In general, I prefer to use portable code when I can. Check out the sample code here: http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000209687.htm. It's easily converted to a data step. The only downside is no support for wildcards in DOPEN, but this is easily supported with prxmatch. Untested: data dirlist; length filename $200; rc=filename("dir","c:\temp"); did=dopen("dir"); do i=1 to dnum(did); filename=dread(did,i); if prxmatch("/^test.*\.sas/io",filename) then output; end; did=dclose(did); rc=filename("dir"); run; This would keep all the files named test*.sas in c:\temp. I admit the prxmatch syntax isn't quite as straightforward as the dir command's wildcards, but it's not hard once you get used to it. Plus you don't need it if you want to read the entire directory (your example doesn't use wildcards in the dir command). It may look like a lot of code but it performs quite well, esp. since you're not spawning an external process to execute the pipe. HTH, Scott

ScottBass · ‎07-16-2012

Yet another approach, which works with any list of delimited tokens: %macro debug; %put %scan(&word,1,|); %put %scan(&word,2,|); %put; %mend; %macro means; proc means data=amount%scan(&word,1,|) nway noprint; var amount; class sex; output out=out%scan(&word,2,|) (drop=_type_ _freq_) sum=; run; %mend; %macro build_list; %global list; %do i=2 %to 12; %let j=%eval(&i+1); %if (%superq(list) eq ) %then %let list=%sysfunc(putn(&i,z2.))%sysfunc(putn(&j,z2))|20%sysfunc(putn(&i,z2.)); %else %let list=&list ^ %sysfunc(putn(&i,z2.))%sysfunc(putn(&j,z2))|20%sysfunc(putn(&i,z2.)); %end; %mend; %let list=; %build_list; %put &list; * alternative, non-macro approach ; %let list=; data _null_; length list $1000; do i=2 to 12; list=catx(" ^ ",list,cats(put(i,z2.),put(i+1,z2.),"|","20",put(i,z2.))); end; call symputx("list",list); run; %put &list; * test ; %loop(&list,dlm=^,mname=debug); * run ; %loop(&list,dlm=^,mname=means); See attached loop macro... HTH, Scott

ScottBass · ‎07-15-2012

Wow, this is sweet I've seen a lot of approaches to LOCF, but never this one. I Googled "SAS LOCF", but stopped after the 3rd hit - none of them used the update statement. To the original poster, Google "SAS LOCF" for more approaches (or you could just quit with Mr. _null_'s approach!) A tiny, minuscule point: if your source dataset was huge (like millions of records), you could also use a view: data try; set sashelp.class; if _n_ in(2,3,4,5,8,9,12,16,17,18) then call missing (of name age weight); else call missing(of sex height); run; proc print; run; data try2 / view=try2; set try; new=1; run; data step; update try2(obs=0) try2; by new; output; drop new; run; proc print; run; Message was edited by: Scott Bass Hey Niam, thanks for the credit for the correct answer, but the real credit goes to Mr. _null_. I just stole his code and added some syntax highlighting 🙂 But hey, he already has enough points lol.

ScottBass · ‎07-15-2012

Hi Quentin, 1. If the stored process is running on stored process server (as SASSRV), I assume the spawned session also runs as SASSRV? Is there a Unix- way to spawn a SAS session and have it run as somebody else? I would love to have the sesion run as the actual user, rather than SASSRV. (I know you can specify a different metadata identify for the spawned session). Yes the spawned session runs as sassrv, but I changed the metadata userid to be our production user so the STP would have metadata access to all required libraries. This was fine for my needs. For the Unix question, see the "su" and "sudo" commands. However, you'd still need a password to switch to another user, unless you were running the su command as root. Running the STP process as root (or the equivalent) would open all sorts of security issues and I'm not recommending it. I would have a chat with your Unix sysadmin / security person if you want to do this. However, if you're on SAS 9.3, IIRC SAS 9.3 has added streaming output to the workspace server. So, you could have your STP run from a workspace server. You'd have a slight performance hit as the workspace server started, but it would run under the userid who logged in to the SPWA. I think this is correct; I recall reading this in a "What's new in SAS 9.3" document. If this does work as I think it would (I'm still on SAS 9.2 so can't test it), that's what I would do if it were critical that the STP run under the credentials of the logged in user. 2. So when you spawn the session, I see where you are specifying the metadata identity in stp_batch_submit.sas: %* We need to specify the user credentials for the spawned SAS job : %let parms=&parms -metauser someuser -metapass {sas002}somepassword; /*Invoke the SAS batch job.*/ systask command " ""&sas"" &parms " nowait; Are you hard coding -metauser and (encoded) -metapass? Or if you want to just pass these along from whoever called the stored process, can you do that somehow? (I see SPWA creates &_METAUSER but obviously not &_METAPASS) Obviously users could enter a password in a prompt, but ot sure you would really want to have users enter their password into a stored process prompt, where it could be easily revealed... I hardcoded the metauser and metapass in the macro, but I didn't have to do so. In my environment, this does not pose a security risk. However, if it did pose a risk, I would move these settings to the SAS configuration file, which is created by the developer of the STP (which in my case is me). Or, the caller of the macro could add them as options. In either case, the STP author would need to know the password of the specified metadata identity. See http://support.sas.com/rnd/itech/doc9/dev_guide/stprocess/input.html (scroll to the bottom) for information on securing sensitive information such as passwords. So, other than the hassle of requiring the user to enter his/her password in a prompt (say to feed to the sudo command), you should be able to do this in SAS in a secure manner. HTH, Scott

ScottBass · ‎07-12-2012

Hi, Since my original post, I've progressed this project to its conclusion, including a couple SAS TS tracks, and can provide more comments. These comments also address some of Quentin's questions above. I did continue with my approach, i.e. STP submitting a secondary SAS batch job, rather than reworking the STP for background processing. Some of the reasons: 1) I have more control over the spawned SAS session: output log location (rather than having the very long SAS log buried within the STP log), memory allocation for the batch SAS process, I can run it under another metadata userid, etc. 2) I have more control over the message displayed to the user. I wanted more than just "Your job <whatever> has been submitted for background processing". In particular, I wanted to let me know to expect an email with an attachment. If SAS made it easier to customize the message returned from a background STP, on a per-STP basis, I would have considered the builtin background processing more strongly. 3) If something was completely haywire with the job, it's easier to kill the process via Process Explorer, rather than bringing down the entire STP server (I have RDP access to the remote SAS server). I learned (via SAS TS) that, when the STP server starts, an authentication token gets created to authenticate to the metadata server. However, when the STP server launched the SAS batch process (with the metaautoresource option), this authentication token for the metadata server was expired. I was getting an invalid credentials message in the SAS batch session, and none of my pre-assigned libraries were being allocated. The solution is to invoke the SAS batch process with the -metauser and -metapass options. I had a few issues with quoting in building the SAS invocation command, esp. quoting such that macro parameters would be expanded. See the attached stp_batch_submit macro. My STPs had a hidden parameter batch_config_file with the path to a SAS config file, which contained all the invocation options for the batch job. The STPs were streaming, not package, since I'm displaying a confirmation message back to the user. The STP code itself is quite simple: %* resolve embedded %sysget function call ; %let batch_config_file=%unquote(&batch_config_file); %stp_batch_submit( config=&batch_config_file ,mvars=name like 'BATCH_%' or name like '_ODS%' /* ,messagefile=C:\SAS\Config\%sysget(lev)\SASApp\SASEnvironment\SASCode\Saspgms\sample_stp.html */ ); Of course, all the heavy lifting is done in the %stp_batch_submit macro. @sasgeek, regarding your question "...is there a way of calling the stored process and then closing the window", you can do so with javascript. In my code, I'm displaying a javascript alert, then closing the window. I wouldn't want to close the window immediately, or the user wouldn't see the message! See the stp_batch_submit macro for example code. Note: the javascript code has slightly different characteristics depending on browser (IE, Chrome, Firefox). Finally, I wrote some macros to help with this process, which I've attached to this post. Wherever you see "parmv", substitute your own macro parameter validation code. HTH, Scott

ScottBass · ‎07-06-2012

Hi All, Thanks for your replies, esp. Cynthia. Much appreciated. I'm using a computed, noprint dummy variable to contain my compute block. If there's a better approach, let me know. My final version (sorry for the length, relevant bits in yellow): %macro print_topx_report; %let data=spdework.topx_source_data; %let nodata=%varexist(&data,message); <<< if there was no source data, I create a msg dataset upstream and print it instead. it will contain the variable "message" ; proc report data=&data nowd spanrows split="*"; columns %if (&nodata) %then %do; message %end; %else %do; label ("Period 1" period1_col1 period1_col2 period1_col3 ) ("Period 2" period2_col1 period2_col2 period2_col3 ) ("% Change" change_col1 change_col2 change_col3 ) dummy <<< dummy variable used to contain "code" ; %end; ; %if (&nodata) %then %do; define message / display center style(column)=[cellwidth=195mm] "0A"x; <<< prints a blank header (no varname or label) %end; %else %do; define label / display style(column)=[cellwidth=101mm]; define period1_col1 / analysis format=comma16. style(column)=[cellwidth=11mm]; define period1_col2 / analysis format=dollar16. style(column)=[cellwidth=20mm]; define period1_col3 / analysis format=dollar16. style(column)=[cellwidth=24mm]; define period2_col1 / analysis format=comma16. style(column)=[cellwidth=11mm]; define period2_col2 / analysis format=dollar16. style(column)=[cellwidth=20mm foreground=green]; define period2_col3 / analysis format=dollar16. style(column)=[cellwidth=24mm]; define change_col1 / computed format=blankpct. style(column)=[cellwidth=13mm background=highlight.] "Label1"; define change_col2 / computed format=blankpct. style(column)=[cellwidth=20mm background=highlight.] "Label2"; define change_col3 / computed format=blankpct. style(column)=[cellwidth=24mm background=highlight.] "Label3"; define dummy / computed noprint; <<< the dummy variable is not printed compute change_col1; <<< compute percent change between period 1 and period 2 if (period2_col1.sum ne 0) then change_col1 = (period2_col1.sum - period1_col1.sum) / period2_col1.sum; else change_col1 = 0; endcomp; compute change_col2; if (period2_col2.sum ne 0) then change_col2 = (period2_col2.sum - period1_col2.sum) / period2_col2.sum; else change_col2 = 0; endcomp; compute change_col3; if (period2_col3.sum ne 0) then change_col3 = (period2_col3.sum - period1_col3.sum) / period2_col3.sum; else change_col3 = 0; endcomp; compute dummy; if prxmatch("/Total|Percent/io",label) then do; call define(_row_,"style","style=DataEmphasis"); <<< I create the Totals and Percent rows in an upstream data step. I want it to look like a PROC REPORT summary line. end; if prxmatch("/Percent/io",label) then do; call define("period1_col1.sum","format","blankpct."); <<< see below for blankpct format. call define("period1_col2.sum","format","blankpct."); call define("period1_col3.sum","format","blankpct."); call define("period2_col1.sum","format","blankpct."); call define("period2_col2.sum","format","blankpct."); call define("period2_col3.sum","format","blankpct."); end; endcomp; %end; quit; %mend; proc format; * format to print blanks for special missing numeric values ; value blankpct .Z = " " other = [percent8.1] ; quit;

ScottBass · ‎07-03-2012

Hi, I need to create a "Top 20" report, with sub-totals for the Top 20, grand totals for all records, and percentage contribution of Top 20 vs. all records. Say the numeric variables are formatted as comma. Is it possible, for the final percentage row, to override that format with percent? I know one workaround: convert all the numeric columns to character, the use the put function to save the formatted character equivalent. Here is some sample code (pretend it is a "Top 5" report). As always, my actual problem is more complex than this. My preferred approach would be to override the column format for the percentage row only. Thanks, Scott data one; length label $50 n1-n6 8; do n1=1 to 20; label=cat("Line",n1); n2=n1*2; n3=n1*3; n4=n1*4; n5=n1*5; n6=n1*6; output; end; run; data two; set one end=eof; array vars n1--n6; array sums{2,6} _temporary_; if _n_ le 5 then do; do i=1 to dim2(sums); sums{1,i}+vars{i}; end; output; end; do i=1 to dim2(sums); sums{2,i}+vars{i}; end; if eof then do; label="Sub-total for Top 5"; do i=1 to dim2(sums); vars{i]=sums{1,i}; end; output; label="Total for All Values"; do i=1 to dim2(sums); vars{i]=sums{2,i}; end; output; label="Percent of Top 5"; do i=1 to dim2(sums); vars{i}=sums{1,i}/sums{2,i}; end; output; end; drop i; run; data three; set two; length c1-c6 $8; array num{*} n1-n6; array chr{*} c1-c6; do i=1 to dim(num); chr{i}=ifc(index(label,"Percent"),put(num{i},percent.),put(num{i},comma.)); end; drop i; run; options nocenter ls=max; * does not work ; proc report data=two nowd; format n1-n6 comma6.; quit; * works but is a pain ; proc report data=three nowd; columns label c1-c6; define c1 / display right; define c2 / display right; define c3 / display right; define c4 / display right; define c5 / display right; define c6 / display right; quit;

ScottBass · ‎07-02-2012

Does this give you what you want? HTH, Scott data have; input byear eyear f1_2009 f2_2009 f1_2010 f2_2010 f2_2011 f1_2011 f1_2012 f2_2012; datalines; 2010 2012 1 . . 1 1 . 1 . . . . 1 1 1 1 . . 1 2009 2011 1 1 1 . 1 . . 1 . . . . 1 . 1 . 1 1 . . . 1 1 . 1 1 1 . 2009 2009 1 1 1 1 . 1 . . ; run; data want; set have; array vars{*} f1_2009 -- f2_2012; do i=1 to dim(vars); vname=vname(vars{i}); vyear=input(scan(vname,2,"_"),best.); if byear <= vyear <= eyear then call missing(vars{i}); end; run; If any of this is unclear, hit the doc on the array statement (and examples), plus the vname, input, and scan functions, plus the call missing routine (you could also just assign vars{i}=. if you prefer).

ScottBass · ‎06-30-2012

Can you describe exactly what you're trying to do? I don't think you need macro in your example. Quote: /* I want to clear fields in a column that end in '&a' */ column1_&a = .; column2_&a = .; column3_&a = .; Say &a=100 when the data step is compiled. This would resolve to: /* I want to clear fields in a column that end in '&a' */ column1_100 = .; column2_100 = .; column3_100 = .; That code isn't going to change, regardless of how much you fiddle with &a. It's already been compiled. What about something like: call missing(of column:); or for something more exotic (untested) array nums column:; do i=1 to dim(nums); if prxmatch("/column(\d+)_(\d+)/",vname(nums{i}) then nums{i}=. end; which would define an array "nums" containing all variables whose names begin with "column". Assume they're all numeric. Loop over the array. If the name matches "column", a number, an underscore, and a number, then set that variable to missing. Again, without knowing what you want to do, it's hard to supply example code. But I suspect you don't need macro here, and you'll just tie yourself in knots if you try. Hope this helps, Scott

ScottBass · ‎06-29-2012

Hi Cynthia and Michelle, I haven't had time to read the links listed above. I also haven't had time to play with and test the code generated by EG. For example, when both a STYLE= (template) and STYLESHEET= (CSS) option is specified, which takes precedence? However, without testing this further (so feel free to correct my errors), this is how I wish EG worked: 1) The Style Manager is a GUI to creating a CSS stylesheet and/or style template. 2) If all it does is create a CSS stylesheet, then don't add the STYLE= option to the generated code! IOW, even with the current code, if all the Style Manager did was create a CSS, then IMO adding the STYLE= option when I selected a user-defined stylesheet is a design bug. 3) If a CSS stylesheet can be converted to a style template (http://support.sas.com/resources/papers/proceedings10/033-2010.pdf, but I haven't read it yet), then EG should build the code to make that happen. Give an option to specify a permanent libref to save the style template to. Or, with the caveat that ODS PATH must have been set, else it's going in SASUSER. IOW, the EG Style Manager is a GUI for both style template and css stylesheet creation. 4) If the Style Manager creates a CSS stylesheet, and CSSSTYLE= is available for Results/HTML/PDF/RTF, then allow the user-defined stylesheet to be selected from the dropdown. Or perhaps a radio button for Style Template | CSS Stylesheet, with the dropdown changing based on the list of style templates or css stylesheets available. 5) It would be nice if EG displayed ALL available style templates in the dropdowns, based on the ODS PATH. But I know this could be an issue with local vs. remote servers. Perhaps add a dropdown for each workspace server defined in the configuration, and a "Refresh Styles" button to populate the dropdown list. In this way, I could select a different style for each workspace server. Allow me to set a single default style template (probably SAS defined) to apply to all workspace servers that do not have a specific style template defined. Bottom line: if I use Tools --> Style Manager, create my style, save it in the default location, then select the created style from the dropdown list, at a minimum I wouldn't expect EG to generate code that creates warnings for every single program!!! So I have to disagree with you Cynthia that the warning is a good thing. Again, what would happen if the STYLE= option was dropped from the generated code (but the STYLESHEET= option remains)? From my limited testing, the STYLESHEET still applies, and I get my desired results. Re: the stability of naming the user-defined CSS styles the same name as SAS supplied styles, by default my user-defined css stylesheet was save in <username>\Application Data (or somesuch - I'm sending this from home and can't look it up). So, I don't see why a SAS upgrade would overwrite that directory? I think it's just a kludge so the EG generated code doesn't generate a warning, and the stylesheet takes precedence over the SAS supplied style template. And I don't see SAS dropping any style templates in a future upgrade. Finally, I tried manually coding CSSSTYLE= for PDF and RTF, but couldn't work out the URL syntax. Plus, I suspect the path would be different whether the workspace server was local or remote, but the UI only allows a single line of code, for all workspace servers. Perhaps I can copy the CSS generated by the Style Manager to a URL served up by a web server??? Thanks Michelle and Cynthia for your replies. Much appreciated. Scott

ScottBass · ‎06-27-2012

Hi Cynthia, I'm sure the posted code would work, but it has this syntax: footnote j=l "^S={preimage='c:\ods_code\cropped.jpg'}" j=c "^S={just=l vjust=m cellwidth=5in}'Centered' footnote with a really really really long string of ^n text which we can control relative to the image's height.^2n"; which I thought was deprecated? Regardless, I didn't want to change my code syntax to match this (deprecated?) syntax. Thus my suggestion to update the unote. I've tried all of the below, and none of them give me my desired results: title1 j=l "^{style [just=c vjust=m fontsize=24pt cellwidth=285mm]This is a very long title ^{newline}which I want to wrap over two lines}" j=r "^{style [preimage=""&logo""]}" ; title1 j=l "^{style [just=c vjust=m fontsize=24pt cellwidth=285mm]This is a very long title ^{newline}" "^{style [just=c vjust=m fontsize=24pt cellwidth=285mm]which I want to wrap over two lines}" j=r "^{style [preimage=""&logo""]}" ; title1 j=l "^{style [just=c vjust=m fontsize=24pt cellwidth=285mm]This is a very long title ^{newline}" j=l "^{style [just=c vjust=m fontsize=24pt cellwidth=285mm]which I want to wrap over two lines}" j=r "^{style [preimage=""&logo""]}" ; title1 j=l "^{style [just=c vjust=m fontsize=24pt cellwidth=285mm]This is a very long title}" j=l "^{style [just=c vjust=m fontsize=24pt cellwidth=285mm]which I want to wrap over two lines}" j=r "^{style [preimage=""&logo""]}" ; Here is what I want to do: My PDF output (landscape orientation) is very full vertically. I need to put the logo in the upper right corner on every page, but I need to center the titles on the page. The titles (some of them long) where fine when I didn't have j= options in the title, which I believe are required for the right justified preimage. The left justified, very wide cellwidth, with an "internal" style justification=center, is my attempt to workaround the limitation listed in the above usage note, which states: "When using any of the SAS/GRAPH® justification options J=L, J=C, and J=R, SAS divides titles and footnotes into equal thirds on an ODS PRINTER (PCL/PDF/PS) page. Because this is an equal split, it is difficult to wrap text across the height of an image included with the PREIMAGE style attribute." It's unfortunate I can't "merge" these equal thirds somehow. If I don't specify the cellwidth, the center third of the page width is way too narrow for my titles, esp. given my point size, and they always wrap to a second line, which shoves content onto a second page of what needs to be a single page report. However, with the cellwidth, some of my very long titles overwrite the logo. These are the titles I need to wrap, with no extra spacing between the title lines. Both lines of the title need to be the same point size and centered. None of the above attempts accomplish this. If I use a title2, I get too much space between the two titles, and again the content shifts to a 2nd page. If there is a way I can reduce the spacing between title lines, that could be the best solution. Whatever the solution, I need to minimize the space consumed by the titles and preimage logo. (I've already reduced the data font-size as far as I want to go. I also checked the margins via proc options group=odsprint; they are all 0.000in). Thanks for any additional ideas you can provide. Regards, Scott

ScottBass · ‎06-27-2012

Hi Cynthia, Thanks. I understand most of this, actually. Let me restate my question: why does EG 4.3 allow me to use the Style Manager to create a new style(sheet), offers this as a selection from the dropdown list for Results and HTML, but then generates incorrect SAS code? Would this code work properly if the STYLE= option was dropped, and only STYLESHEET was present? I'll hit those usage notes in the morning and see if I can coax EG to use the stylesheets created (by EG) without generating a warning. Otherwise, I'll look into PROC TEMPLATE saved to a permanent library and ODS PATH. It would be nice if all our EG users generated output with the company colors and logo (merely be setting default options in EG). Thanks, Scott

Online Status	Offline
Date Last Visited	‎03-11-2021 01:34 AM

Re: How do I unsubscribe from a topic?

Re: How do I unsubscribe from a topic?

How do I unsubscribe from a topic?

Re: How to: Listing all the files in a folder

Re: How to: List all the files in a folder

Re: How to improve performance in Data step where clause

Re: Using SAS to find a streak of occurrences within any seven day win...

Re: Proc transpose error with same name variable and exceeds 32 length

Re: using rsubmit in a loop

Re: using rsubmit in a loop

Re: LOCF

Re: Programming Question

Re: Programming Question

Re: SAS query - Split data to different dataset

Re: SAS query - Split data to different dataset

Re: Why the result useing the catx have blanks

Re: Check numeric values in alphanumeric variables

Re: Read a Variable Value into a Macro Variable

Re: How to: List all the files in a folder

EG Custom Task to play a sound

How to create a working outline for SAS Global Forum

SAS Global Forum Presenter Mentoring Program

Re: Macro to count the number of records

Re: SAS error information

Re: Descending Alphabetical Order in proc report

Re: SAS 9.2: Macro variable with quotes is not working.

Re: %do loop - when parameter values not consecutive

Re: Strange Case of Missing Values

Re: Submitting a long running job in batch mode from a stored process

Re: Submitting a long running job in batch mode from a stored process

Re: PROC REPORT - can I override format for a numeric column for a sin...

PROC REPORT - can I override format for a numeric column for a single ...

Re: Create, increment and use a macro variable in the same data step?

Re: Create, increment and use a macro variable in the same data step?

Re: EG generating warnings from a user-defined style

Re: ODS PDF Wrapping title text containing preimage

Re: EG generating warnings from a user-defined style

SAS Global Forum 2020