03-03-2016 10:45 PM
Hello, I am working on creating a SAS code that will automate a report on Tuberculosis cases and contacts. I have some sample code that I have been working on but it is not looking how I the report looks. I have attached report that I am trying to create. I have also attached my SAS code so far. I can attach the dataset if someone needs it.
options center ps=60 ls=120; /************************************************************** */ /*File: AutoCode to convert Excel Data to SAS data set */ /*Date: 01/21/2016 */ /*Data: Data from C:\Users\Shaquina\ */ */ AutoCode.xlsx */ */Create: C:\Users\Shaquina\AutoCode.sas7bdat */ /****************************************************************/ ; * First: Import data using the import wizard 1. Save file to be imported as a CSV(MS-DOS) file. 2. In SAS, click on File, Import Data... 3. From the drop down menu, click on Comma Separated Values (*.csv). 4. Choose the file destination and click next. 5. Choose library and name the new sas file and click next. 6. Click finish. 7. Proc print data= XX.XXX (obs=5) *This will confirm you successfully uploaded the file**; ******************************************************************************************************; libname Danielle 'C:\Users\Shaquina\'; Title color= black "Tuberculosis Control Program-Arkansas"; title2 color= black "Preliminary"; title3 color = black "Aggregate Reports for Tuberculosis Program Evaluation:"; title4 color = black "Follow-up and Treatment for Contacts to Tuberculosis Cases"; title5 color = black "Reporting Area AR0000"; title6 color = black "Cohort Year: 2014"; title7 color = black "Closure Date for Follow-up: 08/15/2015"; title8 color = black "Total TB Cases Reported: 93"; run; footnote ' '; *merging the data; PROC PRINT data = Danielle.AutoCode1 (obs=5); run; proc print data = Danielle.AutoCode_2 (obs=5); run; proc sort data=Danielle.Autocode1; by StateCaseNo; run; proc sort data= Danielle.Autocode_2; by StateCaseNo; run; data Danielle.AutoCodeAll_; merge Danielle.AutoCode1 Danielle.AutoCode_2; by StateCaseNo; run; proc print data = Danielle.AutoCodeAll (obs=5); run; Data Danielle.NewAutoCode_1; set Danielle.AutocodeAll; if Sputum_Smear_Result= 'Positive' or Sputum_Smear_Result= 'Negative'; if Therapy_Stop_Reason = 'Completed Therapy'; run; proc contents data = Danielle.NewAutoCode_1; run; *Part 1. Cases and Contacts; proc tabulate data =Danielle.NewAutoCode_1 missing; TITLE9 "Types of Cases for Investigation" ; table DispositionClass_Desc="Cases For Investigation" Therapy_Stop_Reason, Sputum_Smear_Result="Sputum Smear"; Class Sputum_Smear_Result DispositionClass_Desc Therapy_Stop_Reason; run; *Reasons Treatment Not Completed; title ''; data Danielle.NewAutoCode_2; set Danielle.AutoCodeAll; if Sputum_Smear_Result='Positive' or Sputum_Smear_Result='Negative'; run; proc tabulate data = Danielle.NewAutoCode_2; table Therapy_Stop_Reason='Reason Treatment Not Completed:', Sputum_Smear_Result='Sputum Smear'; Class Sputum_Smear_Result Therapy_Stop_Reason; run; Footnote ' Public reporting burden for this collection of information is estimated to average 3 hours per manual by data clerks; 30 minutes per hour manual response by program managers; 30 minutes per electronic response by data clers and program managers, including the time for reviewing instructions, searching existing data sources, gathering and maintaining the data needed, and completing and reviewing the collection of information. An agency may not conduct or sponsor, and a person is not required to respond to a collection of information unless it displays a currently valid OMB control number. Send comments regarding this burden estimate or any other aspect of this collection of information, including suggestions for reducing this burden to CDC/ASTDR Information Collection Review Office, 1600 Clfton Road, MS-D-24 Atlanta, GA 30333, ATTN: PRA(0920-0457). Do not send the completed form to this address.'; run;
03-04-2016 06:03 AM - edited 03-04-2016 06:04 AM
In principle, you can't create this type of output as a default part of the TABULATE function; in essence, you are asking for two different table definitions. Anything you do with the SAS syntax will basically amount to adding more dimensions to the table, but it won't fix your core problem.
You can use this code to get the tables you want, but they're still different tables:
PROC TABULATE DATA=want NOSEPS;
VAR stake winnings margin;
TABLE (stake winnings),(N SUM MEAN MEDIAN STDDEV MIN MAX);
TABLE (margin),(N MEAN MEDIAN STDDEV MIN MAX);
There are some guides out there on hacking ODS to do what you want (namely, create "stacked tables" where several child tables are assembled into a single table. Check out here for an example. If you Google "SAS stack tables" you'll find more examples.
I've done this in HTML by creating a new tagset - basically, a special ODS destination that removes spaces between tables, etc. I don't have the code that I used anymore, unfortunately; I moved to R to do automated reporting.
03-04-2016 11:55 AM
Please be explicit about which part(s) "is not looking how the report looks".
Without actual data it will be next to impossible to say what may be needed to generate your report. If you provide some data please make in in the form of a data step which reads datalines. If we have to "import" an excel file the results may be different.
Also anything appearing in the body of a report is likely to need a variable of some sort. So the parenthetical elements such as (a1) may be difficult to incorporate.
Also does the final result have to be in Excel? There can be major headaches getting some strickly appearance options such as the dot leaders ( ......... ) to appear as desired.
And a not about repeated processes:
First: Import data using the import wizard 1. Save file to be imported as a CSV(MS-DOS) file. 2. In SAS, click on File, Import Data... 3. From the drop down menu, click on Comma Separated Values (*.csv). 4. Choose the file destination and click next. 5. Choose library and name the new sas file and click next. 6. Click finish. 7. Proc print data= XX.XXX (obs=5) *This will confirm you successfully uploaded the file**;
If you are going to do this with multiple files, either different data sources or over time, with the same layout do 1 through 5 one time.
The log will have data step code that you can extract and save as a program file and then you only have to change the source file, output set and possible the library. You control informats and especially length of character variables and whether a variable is numeric or character as well as setting variable labels formats, add some data checking for ranges or such and possibly calculating additional needed variables. If you keep using proc import I am willing to make a small wager that eventually something doesn't come in as expected because one of your data sources provides unexpected values an a variable that was defaulting to numeric suddenly becomes character. Which may mean a failure later on.
Whether you could hope to make such tables in a single pass through Proc tabulate depends on your data and many definitions. I've had multiple go-arounds with CDC over what the appropriate numerator and denominators may be for percentages. Often they lead to serious preprocessing before getting to actually preparing the tables.
03-04-2016 02:35 PM
To clarify my question:
The final result does not have to be in EXCEL. I attached the excel dataset, rather than datalines, because that it what we will always be using to create the report. When I stated it is not looking how the report looks, I was referring to the table values. I am not certain how to pull only values from a variable that I need. A little more about the dataset. All TB "contacts" are listed by the StateCaseNo. Repeated StateCaseNo means that is the same contact for one TB case. The purpose of the report is to use the contact's datasheet, which I have attached, to make a report showing how many contacts, how many cases, how many contacts got the disease (disease status is listed in the DispositionClass_Desc variable; blanks indicate not infected), how many contact got the infection (once again, it's listed in DispositionClass_Desc), how many started treatment (I'm assuming we could tell this by looking a started treatment date variable), and how many completed treatment (i used the therapy_stop_reason variable). The "others" variable on the report are referring to kids ages 2 and under. I do not know how to get the table to format like the report I attached on my initial post. I was going to use the Therapy_Stop_Reason variable for the middle table of the report. The last table calculates rates using the same variables from the 1st table. This is the part where I have no idea what to do. The (b1/a1),% next to the numbers in the table I am not concerned about. I am really just wanting the calculation rates. Please let me know if I cleared this up a little or if it is still vague. I have copied and pasted the first few lines of the dataset below. The enitre dataset is attached. Thank you!
|2014ARAR1400001||1||Positive||Positive||1 Positive Culture||12JAN14:00:00:00||13JAN14:00:00:00||Confirmed||19JAN14:00:00:00||Pulmonary||25||No||No||Abnormal||17JAN14:00:00:00||Count as a TB Case||.||17JAN14:00:00:00||17JAN14:00:00:00||Pulmonary||04JUN14:00:00:00||Yes||No||Susceptible||Susceptible||Susceptible||Susceptible||Susceptible||Unemployed||Male||UNITED STATES||Not Hispanic or Latino||28JUN88:00:00:00||71801||Y||No||Black or African American||Alive||18||Positive||Hempstead County||Positive||27JAN15:00:00:00||Completed Therapy||Yes||Yes||Yes||No||Yes||No||No||Not Done||Not Done||.||No||Not Done||Positive||Yes||Hope||United States||903 North Walker||Not Done||18JUN14:00:00:00||Yes||Household||No||AR||.||UNITED STATES||Black||Non-Hispanic||Male||No||No||No||No||Close||House||Household||.||.||4-Mar-14||Positive||T-Spot||.|
|2014ARAR1400001||Positive||Positive||1 Positive Culture||12JAN14:00:00:00||13JAN14:00:00:00||Confirmed||19JAN14:00:00:00||Pulmonary||25||No||No||Abnormal||17JAN14:00:00:00||Count as a TB|