I have a dataset w/ times stored as character strings. In some strings, there is a space between the trailing number, usually 0, and the AM/PM designation (e.g. 10:00 PM). In others, there is no space (8:00PM).
How can I create consistency in these data so that there is a space after the time and before the AM/PM designation?
Obs _300101 _300102 _300201 _300202
1 8:00PM 10:00 PM 4:00 AM 9:00AM
2 1:00 AM 2:00 AM 10:00 AM 12:00 PM
3 12:00AM 11:30AM 8:00AM 6:30AM
4 12:00 AM 2:00 AM 6:00 AM 11:00 AM
Thank you!
Don't store them as character strings.
Use Input to convert them to time variables and then format them as time variables.
@Reeza Could you help me do that?
I'm not having success assigning informats to the data (See code).
DATA timedata_a1_a5 ; INFILE "/folders/myfolders/XXXX/XXXX/! Datasets/Original Datasets/Time Data A1-A5.csv" DLM="," DSD TERMSTR=CR; INPUT PATIENTID a1_300101 TIME8. a1_300102 TIME8. a1_300201 TIME8. a1_300202 TIME8.; RUN;
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
55
56 DATA timedata_a1_a5 ;
57 INFILE "/folders/myfolders/XXX/XXX/! Datasets/Original Datasets/Time Data A1-A5.csv" DLM="," DSD TERMSTR=CR;
58 INPUT PATIENTID
59 a1_300101 TIME8. a1_300102TIME8. a1_300201 TIME8. a1_300202 TIME8. ;
60 RUN;
NOTE: The infile "/folders/myfolders/UCSD/SMART/! Datasets/Original Datasets/Time Data A1-A5.csv" is:
Filename=/folders/myfolders/UCSD/SMART/! Datasets/Original Datasets/Time Data A1-A5.csv,
Owner Name=root,Group Name=vboxsf,
Access Permission=-rwxrwx---,
Last Modified=11Mar2016:12:04:45,
File Size (bytes)=60019
NOTE: Invalid data for a1_300102 in line 1 12-19.
NOTE: Invalid data for a1_300201 in line 1 20-27.
NOTE: Invalid data for a1_300202 in line 1 28-35.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 26,11:00 PM,01:00 AM,09:00 AM,10:30 AM,9:00 PM,10:00 PM,10:00 AM,8:30 AM,10:00 PM,11:00 PM,7:00 AM,8
101 :00 AM,8:00 PM,10:00 PM,4:00 AM,9:00 AM,10:00 PM,12:00 AM,4:30 AM,9:00 AM 173
PATIENTID=26 a1_300101=82800 a1_300102=. a1_300201=. a1_300202=. _ERROR_=1 _N_=1
NOTE: Invalid data for a1_300101 in line 2 4-11.
NOTE: Invalid data for a1_300201 in line 2 20-27.
NOTE: Invalid data for a1_300202 in line 2 28-35.
Etc...
You don't have an informat statement in that code.
Maybe start there?
TIME8. comes from this source, which identifies it as an Informat. What am I missing?
So now that I can type on a full computer 🙂
A full data step import contains the following statements (see code below). You only have the input/infile statements.
Add in an informat and format to control how your data comes in and is displayed. It's cumbersome and annoying as heck, which is why I suggested proc import and then modifying the code.
Having spent the last 2 hours developing code to read in a file that didn't have appropriate documentation I understand your pain :).
Doing it this way will save you a lot of issues down the road...and many questions on here :).
Good Luck!
Data want;
informat ****;
format ****;
infile ***;
input ***;
run;
data have;
input Obs (_300101 _300102 _300201 _300202) (& $40.);
cards;
1 8:00PM 10:00 PM 4:00 AM 9:00AM
2 1:00 AM 2:00 AM 10:00 AM 12:00 PM
3 12:00AM 11:30AM 8:00AM 6:30AM
4 12:00 AM 2:00 AM 6:00 AM 11:00 AM
;
run;
data want;
set have;
array x{*} $ _:;
array y{4} $ 40;
do i=1 to dim(x);
y{i}=prxchange('s/\s*(am|pm)$/ $1/i',-1,strip(x{i}));
end;
drop i;
run;
proc print;run;
Available on demand!
Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.