Hi,
I originally received 2 Excel sheets to merge together for this project. After I merged them, I successfully corrected the dates using this code:
DATA SubQ.dataclean1;
SET SubQ.data;
date_varXR1 = Input(postop_XR_date, mmddyy10.); /* Change dates from character to date values */
format date_varXR1 mmddyy10.;
DROP postop_XR_date;
RENAME date_varXR1=postop_XR_date;
XR2_num = Input(Second_Xray_date, mmddyy10.);
date_varXR2 = XR_num - 21916;
format date_varXR2 mmddyy10.;
DROP Second_Xray_date;
RENAME date_varXR2=Second_Xray_date;
RUN;
However, they just gave me a more recent Excel file with the data already combined, and now the same code isn't working. Is there something different about this Excel file that is causing an issue? I have attached the file with only the dates (removed private patient info).
Thanks!
Excel is just not a great data source. Below code should work for you.
PROC IMPORT
OUT= work.temp
DATAFILE= "c:\temp\Master Compiled Spreadsheet 09-14-2022 Kelly edit dates only.xlsx"
DBMS=xlsx REPLACE;
SHEET="CPT 23472";
/* ensure that all columns get always read as character */
GETNAMES=no;
run;
/* read csv into SAS table */
data want;
attrib
postop_XR_date length=8 format=date9.
Immediate_postop_XR_findings length=$1000
Second_Xray_date length=8 format=date9.
Date_res_of_gas length=8 format=date9.
Infection length=$2
;
set work.temp(firstobs=2);
/* for dates: convert string with Excel count of days to SAS Date value */
postop_XR_date =input(a,?? 32.) + '30DEC1899'd ;
Immediate_postop_XR_findings =b;
Second_Xray_date =input(c,?? 32.) + '30DEC1899'd ;
Date_res_of_gas =input(d,?? 32.) + '30DEC1899'd ;
Infection =e;
drop a b c d e;
run;
/* clean-up */
proc delete data=work.temp;
run;quit;
Because the Excel dates get read as character the SAS columns contain a string of digits that is the count of days like Excel does it. To convert this to a SAS Date value which is the count of days since 1/1/1960 we need to add more days to it as discussed and solved here.
I'm using getnames=no with Proc Import to ensure that the Excel columns always get converted to a SAS character variable. This to ensure stable results even for the day where you get an Excel with no blank cells where SAS would convert the values directly to numerical columns with SAS Date values (if we wouldn't treat the header column as data).
What does 'not working' mean? Are you getting an error in your log? (Please show) Are you getting an incorrect results? (Please show)
Could be a typo in your code?:
XR2_num = Input(Second_Xray_date, mmddyy10.);
date_varXR2 = XR_num - 21916; * <--- should this be XR2_num - 21916? ;
Run Proc Contents on the data from the new spread sheet and show us the result.
One thing I see as a possibility is that the data may been "imported" as a date value so those Inputs to create date values from character may not be needed and result in messages in the log about invalid data. Or that the text of dates changed so that the informat to read the data needs to change.
Please post LOG as text, copy the text and on the forum open a text box and paste the log text.
Pictures are much harder to do things like copy text and show a suggested change, as in I'm too lazy to retype much if any of your log.
You might want to spend some time scanning the forum about all the inconsistencies encountered from one version to another of "same structure" spreadsheet files that do not import "correctly".
That is not a very friendly way to share data. The workbook has three sheets. Two of them have names that are not valid SAS names. And the third is completely empty.
915 libname xx xlsx "C:\downloads\Master Compiled Spreadsheet 09-14-2022 Kelly edit dates only.xlsx"; NOTE: Libref XX was successfully assigned as follows: Engine: XLSX Physical Name: C:\downloads\Master Compiled Spreadsheet 09-14-2022 Kelly edit dates only.xlsx 916 proc copy inlib=xx outlib=work; 917 run; NOTE: Copying XX.CPT 23472 to WORK.CPT 23472 (memtype=DATA). NOTE: Variable Name Change. Immediate postop XR findings -> Immediate_postop_XR_findings NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. ERROR: The value CPT 23472 is not a valid SAS name. ERROR: File WORK.'CPT 23472'n.DATA has not been saved because copy could not be completed. NOTE: Copying XX.MYBI2.CUMEDICINE.US BISM SERVIC to WORK.MYBI2.CUMEDICINE.US BISM SERVIC (memtype=DATA). NOTE: Variable Name Change. 23472 -> _23472 NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. ERROR: The value MYBI2.CUMEDICINE.US BISM SERVIC is not a valid SAS name. ERROR: File WORK.'MYBI2.CUMEDICINE.US BISM SERVIC'n.DATA has not been saved because copy could not be completed. NOTE: Copying XX.SHEET1 to WORK.SHEET1 (memtype=DATA). NOTE: BUFSIZE is not cloned when copying across different engines. System Option for BUFSIZE was used. NOTE: The import data set has 0 observations and 0 variables. NOTE: There were 0 observations read from the data set XX.SHEET1. NOTE: The data set WORK.SHEET1 has 0 observations and 0 variables. NOTE: Statements not processed because of errors noted above. NOTE: PROCEDURE COPY used (Total process time): real time 0.06 seconds cpu time 0.04 seconds NOTE: The SAS System stopped processing this step because of errors.
If you set the VALIDMEMNAME option to EXTEND you can use name literals for member names. T
Let's check out that first sheet to see what is going on.
proc contents data=xx.'CPT 23472'n varnum; run;
You seem to have 3 variables that from their names appear like they should have date values, but instead are character strings.
# Variable Type Len Format Informat Label 1 postop_XR_date Char 8 $8. $8. postop_XR_date 2 Immediate_postop_XR_findings Char 640 $640. $640. Immediate postop XR findings 3 Second_Xray_date Char 5 $5. $5. Second_Xray_date 4 Date_res_of_gas Char 5 $5. $5. Date_res_of_gas 5 Infection Char 2 $2. $2. Infection
So let's look at a few of the values to see what is happening. No need to look at all 600+ characters of that long string.
proc print data=xx.'CPT 23472'n(obs=10) width=min;
format _character_ $32. ;
run;
Result:
Second_ Date_ postop_ Xray_ res_of_ Obs XR_date Immediate_postop_XR_findings date gas Infection 1 42604 Reverse total shoulder arthropla 42620 N 2 43633 Re-redemonstration right reverse 43649 N 3 43115 A reverse left total shoulder pr 43152 N 4 43297 Compared with December 7, 2017. 43327 N 5 43560 Interval reverse shoulder arthro 43574 43602 N 6 42506 New reverse total shoulder prost 42517 N 7 43724 Interval right shoulder arthropl 43739 N 8 42580 A reverse right total shoulder p 42592 N 9 42723 There is been interval placement 42734 N 10 42681 There is been interval placement 42690 N
So it clearly looks like some of the cells in those three columns have TEXT instead of DATE values. So when SAS converted the sheet into a dataset it had to make the variable CHARACTER to handle those few cells with text in them. When it does that it stores the actual number Excel uses to represent the date values as a raw digit string. Hence the 5 digit strings we see in the first 10 observations.
So let's see what type of gibberish is confusing excel and SAS about what type of variables those should be.
proc print data=xx.'CPT 23472'n(obs=10) width=min;
where not (lengthn(postop_XR_date) in (0,5))
or not (lengthn(Second_Xray_date) in (0,5))
or not (lengthn(Date_res_of_gas) in (0,5))
;
format _character_ $32. ;
run;
So the first one has '1/29/18'. That looks like a date to me. Perhaps whoever built the spreadsheet is in a region that prefers to enter dates in DMY order instead. If so then there is no month number 29.
The second one has values like 'NONE'. That is NOT a date in any shape.
The third one has values like 'N/A' or 'NA'. Neither of those is a date either.
Fixing the second two looks simple. Just convert the string to a number and add '30DEC1899'd to adjust for the difference in offset date. (Do NOT use magic numbers like the one in your post, how can anyone reading that understand why you used that particular number?)
For the first one you might want to add some more logic to check if the string is actually in MDY style instead.
data want;
set xx.'CPT 23472'n ;
array dates postop_XR_date Second_Xray_date Date_res_of_gas ;
array daten postop_XR_date_n Second_Xray_date_n Date_res_of_gas_n ;
do over dates;
daten=input(dates,??5.);
if not missing(daten) then daten=daten+'30DEC1899'd ;
else daten=input(dates,??mmddyy10.);
end;
format postop_XR_date_n Second_Xray_date_n Date_res_of_gas_n yymmdd10.;
run;
proc print data=want (obs=10) width=min;
format _character_ $32.;
run;
Your problems come from the fact that you use the worst available file format for data interchange. Use csv or other text files and read them with data steps, and you'll have full control over column types and attributes
Excel is just not a great data source. Below code should work for you.
PROC IMPORT
OUT= work.temp
DATAFILE= "c:\temp\Master Compiled Spreadsheet 09-14-2022 Kelly edit dates only.xlsx"
DBMS=xlsx REPLACE;
SHEET="CPT 23472";
/* ensure that all columns get always read as character */
GETNAMES=no;
run;
/* read csv into SAS table */
data want;
attrib
postop_XR_date length=8 format=date9.
Immediate_postop_XR_findings length=$1000
Second_Xray_date length=8 format=date9.
Date_res_of_gas length=8 format=date9.
Infection length=$2
;
set work.temp(firstobs=2);
/* for dates: convert string with Excel count of days to SAS Date value */
postop_XR_date =input(a,?? 32.) + '30DEC1899'd ;
Immediate_postop_XR_findings =b;
Second_Xray_date =input(c,?? 32.) + '30DEC1899'd ;
Date_res_of_gas =input(d,?? 32.) + '30DEC1899'd ;
Infection =e;
drop a b c d e;
run;
/* clean-up */
proc delete data=work.temp;
run;quit;
Because the Excel dates get read as character the SAS columns contain a string of digits that is the count of days like Excel does it. To convert this to a SAS Date value which is the count of days since 1/1/1960 we need to add more days to it as discussed and solved here.
I'm using getnames=no with Proc Import to ensure that the Excel columns always get converted to a SAS character variable. This to ensure stable results even for the day where you get an Excel with no blank cells where SAS would convert the values directly to numerical columns with SAS Date values (if we wouldn't treat the header column as data).
CSV files are much easier to deal with because you do not need to use PROC IMPORT to GUESS how to read them. You can instead just write your own data step that reads them directly.
The one thing where XLSX files are better is when the values of one or more cells contain end of line characters.
Actually: Using the EXCEL engine there is a direct way to get to the desired result.
options validvarname=v7;
PROC IMPORT
OUT= work.want
DATAFILE= "c:\temp\Master Compiled Spreadsheet 09-14-2022 Kelly edit dates only.xlsx"
DBMS=excel
REPLACE
;
dbdsopts= "dbtype=( postop_XR_date='date'
Second_Xray_date='date'
Date_res_of_gas='date'
)";
;
SHEET="CPT 23472";
GETNAMES=yes;
run;
I wish the XLSX engine would support DBSASTYPE for specifying the type of each column when you read in an Excel file. If you agree, please upvote: https://communities.sas.com/t5/SASware-Ballot-Ideas/Add-DBSASTYPE-or-similar-option-to-XLSX-engine-t....
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.