I am attempting to refine code written by a former colleague. I am attempting to combine imported datasets into one master dataset (previous data from previous data runs are needed for present and future data runs and I would like to create a large master file instead of my colleague having to import in multiple files from previous data runs). The issue is there are lots of empty cells and when this happens it seems that SAS will read them in as characters, but with the cells with data SAS reads them in as numbers, which is causing the following error messages. Is there a way to change the format in SAS I use SAS 9.4), or will I have to combine the data in excel (there are over 50 spreadsheets with 3 tabs each) Here is the code below:
1261 data allmiss_hosp_info;
1262 set
1263
1264 QA_Miss_Hosp_Info_10NOV2021
1265 QA_Miss_Hosp_Info_17NOV2021
1266 QA_Miss_Hosp_Info_23NOV2021
1267 QA_Miss_Hosp_Info_01DEC2021
1268 QA_Miss_Hosp_Info_08DEC2021
1269 QA_Miss_Hosp_Info_15DEC2021
1270 QA_Miss_Hosp_Info_Xx13
1271 QA_Miss_Hosp_Info_Xx14
1272 QA_Miss_Hosp_Info_Xx15
1273 QA_Miss_Hosp_Info_Xx16
1274 QA_Miss_Hosp_Info_Xx17
1275 QA_Miss_Hosp_Info_Xx18
1276 QA_Miss_Hosp_Info_Xx19
1277 QA_Miss_Hosp_Info_Xx20
1278 QA_Miss_Hosp_Info_Xx21
1279 QA_Miss_Hosp_Info_Xx22
1280 QA_Miss_Hosp_Info_Xx23
1281 QA_Miss_Hosp_Info_Xx24
1282 QA_Miss_Hosp_Info_Xx25
1283 QA_Miss_Hosp_Info_Xx26
1284 QA_Miss_Hosp_Info_Xx27
1285 QA_Miss_Hosp_Info_Xx28
1286 QA_Miss_Hosp_Info_Xx29
1287 QA_Miss_Hosp_Info_Xx30
1288 QA_Miss_Hosp_Info_Xx31
1289 QA_Miss_Hosp_Info_Xx32
1290 QA_Miss_Hosp_Info_Xx33
1291 QA_Miss_Hosp_Info_Xx34
1292 QA_Miss_Hosp_Info_Xx35
1293 QA_Miss_Hosp_Info_Xx36
1294 QA_Miss_Hosp_Info_Xx37
1295 QA_Miss_Hosp_Info_Xx38
1296 QA_Miss_Hosp_Info_Xx39
1297 QA_Miss_Hosp_Info_Xx40
1298 QA_Miss_Hosp_Info_Xx41
1299 QA_Miss_Hosp_Info_Xx42
1300 QA_Miss_Hosp_Info_Xx43
1301 QA_Miss_Hosp_Info_Xx44
1302 QA_Miss_Hosp_Info_Xx45
1303 QA_Miss_Hosp_Info_Xx46
1304 QA_Miss_Hosp_Info_Xx47
1305 QA_Miss_Hosp_Info_Xx48
1306 QA_Miss_Hosp_Info_Xx49
1307 QA_Miss_Hosp_Info_Xx50
1308 QA_Miss_Hosp_Info_Xx51
1309 QA_Miss_Hosp_Info_Xx52
1310 QA_Miss_Hosp_Info_Xx53
1311 QA_Miss_Hosp_Info_Xx54
1312 QA_Miss_Hosp_Info_Xx55
1313 QA_Miss_Hosp_Info_Xx56
1314 QA_Miss_Hosp_Info_Xx57
1315 QA_Miss_Hosp_Info_Xx58
1316 QA_Miss_Hosp_Info_Xx59
1317 QA_Miss_Hosp_Info_Xx60
1318 QA_Miss_Hosp_Info_Xx61
1319 QA_Miss_Hosp_Info_Xx62
1320 QA_Miss_Hosp_Info_Xx63
1321 QA_Miss_Hosp_Info_Xx64
1322 QA_Miss_Hosp_Info_Xx65
1323 QA_Miss_Hosp_Info_Xx66
1324 QA_Miss_Hosp_Info_Xx67
1325 QA_Miss_Hosp_Info_Xx68
1326 QA_Miss_Hosp_Info_Xx69
1327 QA_Miss_Hosp_Info_Xx70
1328 QA_Miss_Hosp_Info_Xx71
1329 QA_Miss_Hosp_Info_Xx72
1330 QA_Miss_Hosp_Info_Xx73
1331 QA_Miss_Hosp_Info_Xx74
1332 QA_Miss_Hosp_Info_Xx75
1333 QA_Miss_Hosp_Info_Xx76
1334 QA_Miss_Hosp_Info_Xx77
1335 QA_Miss_Hosp_Info_Xx78
1336 QA_Miss_Hosp_Info_Xx79
1337 QA_Miss_Hosp_Info_Xx80
1338 QA_Miss_Hosp_Info_Xx81;
ERROR: Variable IntubationDate has been defined as both character and numeric.
ERROR: Variable ExtubationDate has been defined as both character and numeric.
ERROR: Variable HospDtlICUAdmitDt has been defined as both character and numeric.
ERROR: Variable HospDtlICUDischDt has been defined as both character and numeric.
ERROR: Variable HospDtlIntub1StartDt has been defined as both character and numeric.
ERROR: Variable HospDtlIntub1EndDt has been defined as both character and numeric.
ERROR: Variable repeat has been defined as both character and numeric.
ERROR: Variable repeat has been defined as both character and numeric.
ERROR: Variable NoAdmitDate has been defined as both character and numeric.
The source is excel, here is the code my former colleague wrote to import them in:
/*import data from excel that the team has reviewed to SAS dataset*/
%MACRO CheckExcel(ex=, tab=);
PROC IMPORT OUT=WORK.&tab
/*DATAFILE= "\\hospital patient data\Hosp_Death_Report\ team notes\&ex..xlsx"*/
DATAFILE= "\\hospital patient data\Hosp_Match\QA\Data\ team notes\&ex..xlsx"
DBMS=EXCEL REPLACE;
RANGE="&tab$";
GETNAMES=YES;
MIXED=NO;
SCANTEXT=YES;
USEDATE=YES;
SCANTIME=YES;
RUN;
%MEND CheckExcel;
%CheckExcel(ex=QA_ICU_Intub_X2_09SEP2021, tab=QA_IcuInt_09SEP2021);
%CheckExcel(ex=QA_ICU_Intub_X3_15SEP2021, tab=QA_IcuInt_15SEP2021);
%CheckExcel(ex=QA_ICU_Intub_Xx1_22SEP2021,tab=QA__IcuInt_Other_22SEP2021);
This is a known problem with trying to use a SPREADSHEET as a data source. Spreadsheets are for doing adhoc calculations so therefore they allow any cell to contain any information. A DATASET has variables, so every for every observation in the dataset the variable has to have the same type of value. You should be able to find a gazillion questions on this forum from users asking how to "fix" the values imported from a spreadsheet.
So if you can modify the process to use some other file format (a delimited text file for example) then you will have more flexibility to read the data in using consist data structure. With a text file you can write your own data step to read the file.
It looks like most of your problems are caused by empty columns. When PROC IMPORT sees an empty column in it creates it as a character variable of length 1 since it will require 7 fewer bytes than it takes to store the 64-bit binary values SAS uses for numbers. You might want to see if some of the solutions on this site for dropping empty variables help you.
Is this a one time need to combine these datasets? If so just figure out which datasets have the variables with the wrong type and use dataset options to either remove them or rename them so you do not get errors.
Or do you need something that will adjust to the next crazy spreadsheet you have to read? In that case you need some idea of what structure you want to create. Then you can compare the structure that PROC IMPORT created and make code to fix it.
The previous datasets will only need to be combined into a master file once, what my plan is, is to save this as a SAS dataset so that I don't have the issues that exist with excel. Then each week new data will be added to the master file (but this data will be created wholly within SAS).
Before you combine the datasets, you will have to go through them one by one and fix any issues. This is tedious work, but you'll only do it once.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.