I am using a dataset from PRAMS. I imported the file which has several different tabs for different variables. Under each tab, there are different survey questions in each individauls from each state replied. Since they are on the same tab however, SAS is creating one giant table that includes each survey question for each variable. This is causing there to be up to 800 obs.
I am also wondering how I can code this so that I have a select few states from the set instead of all of them. Below you will find my code. I clearly am new to this, so any advice would be appreciated. Thanks!
I don't have any insight in PRAMS, and maybe not so many other here at he communities, so a bit more details would be helpful to understand your challenge.
"SAS is creating one giant table that includes each survey question for each variable."
Please provide a PROC CONTENTS to describe your imported data set.
For the State filter, isn't there a variable that holds that information?
Again, more detail required to help you.
I am a bit familiar with PRAMS. Apparently the access to the public use data sets requires some set up that I'm not going to attempt to get access at this time.
I am a bit surprised that @SAS0005 is dealing with such a spreadsheet as most of the CDC sponsored projects like this will also have data available in SAS data set forms and often has analysis code which can be pretty critical for the sample designs of such surveys.
I suggest that you go back to the source of your data and first see if you can get SAS data directly as Proc Import, or somewhat better the LIBNAME EXCEL or XLSX engines may have issues with inconsistent variable lengths and possibly types for the same variables given the way that the import and Libname engines work with Excel files.
It appears you've sourced your Excel from https://www.cdc.gov/prams/prams-data/form/arf-access-form.htm under the Selected ...Indicators section. These are REPORTS and not in a form that's ready made for import into a table for further processing.
Post processing such imports of Excel reports to get to a proper rectangular structure required by tables can be tedious (and would be for these reports).
If I understand this right then you can register with PRAMS and get access to more detailed and much more suitable source data for use with SAS. Based on the codebooks it appears PRAMS is also using SAS.
@SAS0005 I still recommend that you register with Prams and get the proper data for analysis. But just for fun and to improve my Python skills a bit...
When downloading and looking into these prams Excel reports found here I realised that for each table a named ranges had been defined (Table1, Table2, etc.) and that all tables in all sheets have the same structure. In the following an approach how to read such Excel data into a SAS table.
I've run the Python script separately but there are also various methods how to run a Python script directly out of a SAS session (available methods depend on SAS version) or alternatively you can also call SAS out of Python.
Step 1: Using Python read each table (range) into a data frame and then export to a .csv
import pandas as pd from openpyxl import load_workbook from openpyxl.utils.cell import range_boundaries import re workbook_path = r"C:\temp\prams_test\\" workbook_name = "PRAMS-MCH-Indicators-2020-508.xlsx" outfile_root = r"C:\temp\prams_test\source_csv\\" wb = load_workbook(workbook_path + workbook_name) def replace_non_alphanumeric(string, replacement=''): return re.sub(r'[^0-9a-zA-Z_]+', replacement, string.strip()) def extract_tables(ws, sheet_name): dfs_tmp = {} for name, table_range in ws.tables.items(): # Get position of data table defined by named range min_col, min_row, max_col, max_row = range_boundaries(table_range) # read value of cell relative to top left corner of named range reference_cell = ws[table_range.split(':')[0]] table_name = reference_cell.offset(row=-1).value table_name = table_name.replace(',',' ') # Convert table to DataFrame table = ws.iter_rows(min_row, max_row, min_col, max_col, values_only=True) header = next(table) df = pd.DataFrame(table, columns=header) # add columns workbook, sheet_name and table_name to data frame df.insert(0, "workbook_name", workbook_name) df.insert(1, "sheet_name", sheet_name) df.insert(2, "table_name", table_name) # write data frame to .csv without header row and index column df.iloc[1:].to_csv(outfile_root + replace_non_alphanumeric(workbook_name.split('.')[0]).lower() + "_" + replace_non_alphanumeric(sheet_name).lower() + "_" + replace_non_alphanumeric(table_name).lower() + ".csv" , index=False, header=False) dfs_tmp[name] = df return dfs_tmp # Dictionary to store all the dfs in. # Format: {table_name1: df, table_name2: df, ...} dfs = {} for ws in wb.worksheets: dfs.update(extract_tables(ws, ws.title))
Above creates per table a separate .csv with a naming convention: <workbook name>_<sheet name>_<table name (derived from the table title)>.csv
Step 2: Read the .csv's into SAS
You can use a wildcard in the csv name to for example read all the tables belonging to a workbook or to only read the tables belonging to a single worksheet. And you can of course also already sub-set your data as part of reading it.
data want;
infile "C:\temp\prams_test\source_csv\pramsmchindicators2020508_breastfeedingpractices_*.csv"
truncover dsd dlm=",";
attrib
workbook informat=$200. label='Name of Excel Workbook'
sheet informat=$31. label='Name of Excel Sheet'
table_name informat=$200. label='Name of table in Excel Sheet'
site_name informat=$100. label='State'
denominator informat=best32. label='N (Denominator) - Unweighted Sample Size'
numerator informat=best32. label='N (Numerator) - Unweighted Frequency'
weight informat=best32. label='Weighted %'
lower95 informat=best32. label='Lower 95% - Confidence Interval'
upper95 informat=best32. label='Upper 95% - Confidence Interval'
;
input
workbook
sheet
table_name
site_name
denominator
numerator
weight
lower95
upper95
;
if site_name in ('Alabama','Florida');
run;
proc print data=want;
run;
Btw: The value for table name is sourced from the title above the table
And because this cell was not part of the named range for the table I needed below code to derive it
# read value of cell relative to top left corner of named range reference_cell = ws[table_range.split(':')[0]] table_name = reference_cell.offset(row=-1).value table_name = table_name.replace(',',' ')
So assuming you are right and those are the XLSX files of interest they should not be hard to read with SAS because all of the variables appear to be numeric. You just need to start reading in cell A4 and tell SAS to not invent variable names. You can then read the whole sheet and transpose it to get the metadata for the variables.
I tried it for the first two sheets for one of the files and the metadata (column headers) are exactly the same.
filename x "C:\downloads\PRAMS-MCH-Indicators-2016-2021.xlsx";
proc import dbms=xlsx datafile=x out=meta_2016 replace ;
getnames=NO;
range='2016$A1:';
run;
proc import dbms=xlsx datafile=x out=data_2016 replace ;
getnames=NO;
range='2016$A4:';
run;
proc transpose data=meta_2016(obs=3) out=names_2016;
var _all_;
run;
proc import dbms=xlsx datafile=x out=meta_2017 replace ;
getnames=NO;
range='2017$A1:';
run;
proc import dbms=xlsx datafile=x out=data_2017 replace ;
getnames=NO;
range='2017$A4:';
run;
proc transpose data=meta_2017(obs=3) out=names_2017;
var _all_;
run;
proc compare data=names_2016 compare=names_2017;
run;
You can process the headers to generate your own unique variable names.
data newnames_2017 ;
set names_2017 ;
length varname $32 label $256 indicator variable $200 ;
retain indicator variable ;
if _N_=1 then varname='STATE';
else do;
if col1 ne ' ' then do;
ind_num+1;
indicator=col1;
var_num=0;
end;
if col2 ne ' ' then do;
variable=col2;
var_num+1;
sub_num=0;
end;
sub_num+1;
varname=cats('var',ind_num,'_',var_num,'_',sub_num);
label=catx(' - ',indicator,variable,col3);
end;
run;
proc print;
var _name_ varname label ;
run;
So you get something like this:
Which you can use to generate code to RENAME the variables and attach a label.
Or perhaps use to reshape the data into more normal form since it looks like that third level of name is the same for every variable:
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.