About ballardw

ballardw · ‎06-30-2025

By "multiple sessions" do you mean concurrently? As in multiple SAS jobs running at the same time, possibly by different users. That is likely to be a bit tricky as you have a possibility of attempting multiple access at same time. If by "multiple sessions" you mean on different days then proc append would work just fine. Unless you have the same problems with changing variable types/names as your other posts.

ballardw · ‎06-28-2025

Current SAS system settings for Linesize and Papersize are what? The total of the defined lengths for the ID variables? For best results any time you ask a question about an Error message you should include the log showing the submitted code along with all the messages from the step that throws the error. IF the code might in a macro then set option MPRINT before executing the macro to generate more details and so the error message is more likely to appear near the problem code. Copy the log then on the forum open a text box using the </> icon above the message window and paste all the copied text. The text box helps preserves formatting of the log text and will visually separate the log from other commentary or response text typed.

ballardw · ‎06-25-2025

@Callam1 wrote: Thank you. Is the proc summary quicker than proc sort? A likely issue with Proc Summary is getting the values for "other columns" as you said. Every variable would have to be referenced somewhere in the Proc Summary and might be a headache.

ballardw · ‎06-25-2025

One way to create groups for analysis is a custom format. There are advantages to using a format if the "class" or "group" is based on a single variable. The first is that you do not have to create any other variables so the concerns about which data set with which recoded variable is not an issue. It is a bit hard to tell if this suggestion is appropriate because the code you show throws errors, at least in my version of SAS: 29 proc logistic data=two_class ; 30 model class1( event='1' ) = SepalWidth SepalLength PetalWidth 30 ! PetalLength ; 31 run ; ERROR: All observations have the same response. No statistics are computed. NOTE: The SAS System stopped processing this step because of errors. NOTE: There were 150 observations read from the data set WORK.TWO_CLASS. NOTE: PROCEDURE LOGISTIC used (Total process time): real time 0.00 seconds cpu time 0.00 seconds If you are trying to model combinations of the characteristics to predict species I'm moderately sure that Proc Logistic isn't the right one. A format approach might look like: proc format ; value $setosa 'Setosa' = 'Iris Setosa' other = 'Others' ; proc logistic data=sashelp.iris; model species = SepalWidth SepalLength PetalWidth PetalLength ; format species $setosa.; run; Which doesn't throw an error but does have separation of data issues. A drawback to formats is creating the formats and making sure they are available in the current session (run the Proc Format code or create format catalogs in a permanent library and in the format search path). Other advantages of formats include: 1) Time. Especially if you have very large data sets, it can take time (and storage space) to add the additional variables to data set. 2) Ease of changing ranges of definitions, especially for numeric values. (Character values and "range" using the < in the proc format are very problematic for general character values but may work some fixed length single case values). Note in the example the keyword OTHER which assigns all values not explicitly listed to a single response. 3) For a fair number of examples the code for Proc Format may be simpler than data step code. 4) Specially structured data sets can be used to create formats. There are the limits of only a single variable can be involved. If missing values are to be excluded and not treated as the "Other" category an appropriate value clause such as . = 'Missing' or ' '='Missing' may be needed. Relying on the stored catalogs for use and not maintaining the code to create the formats can be problematic for moving to different versions of SAS. Best is to use the Proc Format CNTLOUT= option to create data sets that can be used to recreate the formats and keep the location handy. An example of the data set to create three formats, one for each of the species: proc sql; create table temp as select distinct(species) from sashelp.iris ; quit; data iriscntrl; set temp; by species notsorted; fmtname= strip(species); /* check documenation for rules of format names. making these may be the hardest part of automated process */ type='C'; start =species; label =catx(' ','Iris',species); output; if last.species then do; call missing (start); label='Other'; hlo='O'; /* variable to additional informat of how format used. Capital o is for the Other instruction*/ output; end; run; proc format cntlin=iriscntrl; run; More complicated code is possible. In some cases you would want to sort the control set by the format name to make sure all the start values are together in the set. Otherwise the Proc format results may be odd or fail.

ballardw · ‎06-21-2025

Save LOG of the program. Often there are notes or warnings about what goes on in addition to the errors. Those may tell us what to look at. If any of the sources or targets used are in external databases there may be additional settings needed to discuss. I don't use Studio or know your entire environment. I do know that for many years that the default setting can be different between an interactive session and what we used to call "batch" processing. It may be that your schedule behaves somewhat like a batch program and needs one or more system options adjusted to run in the schedule mode.

ballardw · ‎06-20-2025

If a data step fails because of a structure difference Proc Append is very likely to fail as well. You didn't answer anything about the actual source/contents of the daily file, just mentioned that the code is in the same program. Proc Append by default will fail to append when: A variable in the appending data set is not in the base A variable in the appending data set has a different type than a variable of the same name in the base file (i.e. numeric vs character). A variable in the appending data set has larger length that a variable of the same name in the base file. A smaller length will generate a warning. Proc Import is often a cause of these differences as Import will make separate guesses as to the properties of variables or in the case of names and number of variables get those if the source file changes. Similar if reading from any external data base where the properties/columns of the data files change.

ballardw · ‎06-19-2025

Can you provide an link or other image similar to what you want to create? I can think of several ways to interpret this " with two scatter plots on both left and right side, and subgroup in the middle" and experience tells me that the first two tries seldom match the desired result.

ballardw · ‎06-18-2025

I generally use the /* comment */ style of comment unless there is a very specific reason to have the comment appear in the log. They work in both the macro and normal code. Plus the SAS enhanced editor supports making and removing them easy with the Ctrl / (comment an entire line) and Shift Ctrl / (remove line comment). Caution: it is possible to accidentally nest comments with the editor and blocks of text and that can be a bad thing. I also tend to separate comments to different lines than actual code to avoid the issues involving the editor line comment keys. But that may just be me. One of the issues with the basic inline comment style of * some comment text; happens when the comment involves a single quote which might happen when saying something like "In the following code I'm looking to do X" or macro triggers like % or & .

ballardw · ‎06-18-2025

I seem to recall there are two different warning messages. The one you are seeing is the heads up maybe 45 days. Then you get one with the DATE that things will stop working at about 15 days. Or I could be remembering incorrectly as retirement is starting to rot my memory...

ballardw · ‎06-18-2025

Or assign a label with extra underscores : proc report data=test nowd; columns _all_; define _all_ / display ; format _type_ $char3.; label _freq_='__freq__' _type_='__type__'; run;

ballardw · ‎06-17-2025

This message highlighted in red below indicates a very likely mismatched quote somewhere previous th code you show: 1 FILENAME _DATAOUT TEMP; 12 %LET SYSCC=0; 13 %LET _CLIENTAPP='SAS Studio'; 14 %LET _CLIENTAPPABREV=Studio; 15 %LET _CLIENTAPPVERSION=3.81; 16 %LET _CLIENTVERSION=3.81; NOTE: The quoted string currently being processed has become more than 262 bytes long. You might have unbalanced quotation marks. 17 %LET _CLIENTMODE=wip; 18 %LET _SASSERVERNAME=%BQUOTE(SASApp); 19 %LET _SASHOSTNAME=%BQUOTE(odaws01-usw2); 20 %LET _SASPROGRAMFILEHOST=%BQUOTE(odaws01-usw2); 21 %LET _CLIENTUSERID=%BQUOTE(u58028236); 22 And since none of those data steps show anything about observations or run time then they did not execute at all. Which would explain why nothing was exported. The repeated notes about the quoted string means that all the "code" is actually inside something that has been quoted and never sees an execute . One other thing to look at is in your macro definition. Use of incline comments such as *Step 5: Create a frequency table; Can cause problems. Thin inline comment for macros is %*. Either this or the /* comment */ style of comment should be used inside of a macro definition. From the documentation for /* Macro comment: Comparisons SAS comment statements of the form *commentary; or comment commentary; are complete SAS statements. Consequently, they are processed by the tokenizer and macro facility and cannot contain semicolons or unmatched quotation marks.

ballardw · ‎06-16-2025

@Season wrote: Does importing the file with the DATA step necessitates specification of the name and informat of each of every variable in the CSV? That would be a very formidable job as I have possibly thousands of columns in all. Proc Import will write a basic data step program to read a text file. The code will be in the log and can be copied from the log to the editor, cleaned up and rerun (or issue a RECALL command immediately after the Proc Import to bring the code into the editor.) The types of informats that often need to be addressed are those where the value contains all digits but you want to maintain leading zeros, such as account numbers. Change the informat to character long enough to hold the value (a $20. or similar). In most cases when modifying a Proc Import generated data step you can drop the FORMAT statements for variables except date, time and datetime variables unless you want to assign custom formats. Also to look out for are columns with mixed use of negative signs and () for negative values, or currency and percent signs that aren't on every value. These may require additional coding as well as in read as character and parse. If you have multiple currency symbols such as dollar, Yen, Pound, Franc and such this might be a very import consideration if you want to manipulate the currency values in any consistent manner. Check on the assigned informat for your problem variables. If they were read as character but should be dates that is an indication that you may need to create new variables by parsing the values. Check on your national language settings (NLS) to see what order dates are read. OR if you see lots of invalid data messages involving those variables it is one indicator that the order may be different than your NLS and override to read as character and parse. If you have variables that would best be considered Boolean, i.e. Yes/No, True/False, and such it may be worth creating and using a custom informat so that the results are numeric 1/0 as that will be much easier to work with in most cases going forward instead of a hodgepodge of Y/N T/F character values. Another consideration not mentioned yet, is if these files are supposed to be of the same layout you should be able to use the same data step to read all of them by changing name of input file and output data set. But it is very likely that lengths of character variables will differ between files. So modify any of the $w. informats to allow for this. I generally start at 15% or so longer than the generated data step. And then check after reading that the values look right. If not make the informat wider and re-read. A last issue relates to variable names generated from column headings that are either very long (will get truncated at 32 characters) or identical in the source file. If column headings are identical for the first 32 characters of a longer heading the first will get part of the text as the variable name. The others will get VARxxxx where xxxx may be the column number in the file. Identical shorter heading may get numeric suffixes added. Example a file with multiple headings of "Total", the first will have a variable name of Total, the next Total2 (or Total1 been awhile) with incremented numbers for each following. Recommend setting option VALIDVARNAME=V7 before Proc Import. Dealing with variable names with spaces and non-standard characters gets old real quick having to use the name literal such as 'Stupid variable name'n every where. The V7 option will replace all the special characters with _ and be easier to type (or rename as desired). One tool to help deal with some of this if you don't have good documentation is to copy the header row of the CSV, assuming it has column headers, and Paste that TRANSPOSED into a spreadsheet. That will give you one "row" per variable to do such things as examine how long the variable names might be, whether different files have different headers (paste into a different column in the spreadsheet and run a comparison of values). If you have a source that has narrative column headings you can with a little work in the spreadsheet get it to create LABEL assignment statements for variables by pasting the variable names from the proc import generated data set into another column (either using the INPUT statement from the code or Proc Contents output) and use spreadsheet functions to create text like varname ="original column heading text goes here". Any data source that may have "thousands of columns" and doesn't provide documentation as to content of the file, such as expected lengths of character variables and layouts of date, time or datetime values needs to be considered with great suspicion. Without documentation how do you know what anything represents?

ballardw · ‎06-16-2025

What does the LOG show when you run these codes? It is usually best to include LOG of code you are asking about. Copy the text from the log including all the messages or notes. Then on the forum open a text box clicking on the </> icon above the main message window and paste the text. The log will often show why no output was created. My first suspect would be to double check and see if the data set Nhanes_dict was created and has any observations.

ballardw · ‎06-16-2025

@Kathryn_SAS wrote: ...If you have different formats in a column, you need it to be imported as character and then you can make changes after the data is read in. I've had fairly good luck with SAS using the ANYDTDTE , ANYDTDTM or ANYDTTME with Proc Import generated code when date, datetime or time values appear in different layouts for a single variable. But I don't remember any where space was the delimiter between the data elements. However that is best when the years are 4 digits. With 2 digit years you have to pray that the order of values entered does match the national language settings.

ballardw · ‎06-15-2025

You need to be a bit more specific as to what the "report" would contain. For example a report that summarizes numeric values by some sort of grouping variables would quite likely display many fewer than 20000 rows. What are the rules for "coloring cells"? This is likely crucial. If the coloring is based only on the content of the cell itself then Proc Print may well be a much better option for that much output. If you are displaying all 20,000 observations that is roughly the equivalent of 250 pages of output depending on page and font sizes. Who is going to "read" these 250 pages?

Online Status	Offline
Date Last Visited	Sunday