I have a dataset w/ times stored as character strings. In some strings, there is a space between the trailing number, usually 0, and the AM/PM designation (e.g. 10:00 PM). In others, there is no space (8:00PM).
How can I create consistency in these data so that there is a space after the time and before the AM/PM designation?
Obs _300101 _300102 _300201 _300202
1 8:00PM 10:00 PM 4:00 AM 9:00AM
2 1:00 AM 2:00 AM 10:00 AM 12:00 PM
3 12:00AM 11:30AM 8:00AM 6:30AM
4 12:00 AM 2:00 AM 6:00 AM 11:00 AM
Thank you!
Don't store them as character strings.
Use Input to convert them to time variables and then format them as time variables.
@Reeza Could you help me do that?
I'm not having success assigning informats to the data (See code).
DATA timedata_a1_a5 ; INFILE "/folders/myfolders/XXXX/XXXX/! Datasets/Original Datasets/Time Data A1-A5.csv" DLM="," DSD TERMSTR=CR; INPUT PATIENTID a1_300101 TIME8. a1_300102 TIME8. a1_300201 TIME8. a1_300202 TIME8.; RUN;
1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK;
55
56 DATA timedata_a1_a5 ;
57 INFILE "/folders/myfolders/XXX/XXX/! Datasets/Original Datasets/Time Data A1-A5.csv" DLM="," DSD TERMSTR=CR;
58 INPUT PATIENTID
59 a1_300101 TIME8. a1_300102TIME8. a1_300201 TIME8. a1_300202 TIME8. ;
60 RUN;
NOTE: The infile "/folders/myfolders/UCSD/SMART/! Datasets/Original Datasets/Time Data A1-A5.csv" is:
Filename=/folders/myfolders/UCSD/SMART/! Datasets/Original Datasets/Time Data A1-A5.csv,
Owner Name=root,Group Name=vboxsf,
Access Permission=-rwxrwx---,
Last Modified=11Mar2016:12:04:45,
File Size (bytes)=60019
NOTE: Invalid data for a1_300102 in line 1 12-19.
NOTE: Invalid data for a1_300201 in line 1 20-27.
NOTE: Invalid data for a1_300202 in line 1 28-35.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+----9----+----0
1 26,11:00 PM,01:00 AM,09:00 AM,10:30 AM,9:00 PM,10:00 PM,10:00 AM,8:30 AM,10:00 PM,11:00 PM,7:00 AM,8
101 :00 AM,8:00 PM,10:00 PM,4:00 AM,9:00 AM,10:00 PM,12:00 AM,4:30 AM,9:00 AM 173
PATIENTID=26 a1_300101=82800 a1_300102=. a1_300201=. a1_300202=. _ERROR_=1 _N_=1
NOTE: Invalid data for a1_300101 in line 2 4-11.
NOTE: Invalid data for a1_300201 in line 2 20-27.
NOTE: Invalid data for a1_300202 in line 2 28-35.
Etc...
You don't have an informat statement in that code.
Maybe start there?
TIME8. comes from this source, which identifies it as an Informat. What am I missing?
So now that I can type on a full computer 🙂
A full data step import contains the following statements (see code below). You only have the input/infile statements.
Add in an informat and format to control how your data comes in and is displayed. It's cumbersome and annoying as heck, which is why I suggested proc import and then modifying the code.
Having spent the last 2 hours developing code to read in a file that didn't have appropriate documentation I understand your pain :).
Doing it this way will save you a lot of issues down the road...and many questions on here :).
Good Luck!
Data want;
informat ****;
format ****;
infile ***;
input ***;
run;
data have;
input Obs (_300101 _300102 _300201 _300202) (& $40.);
cards;
1 8:00PM 10:00 PM 4:00 AM 9:00AM
2 1:00 AM 2:00 AM 10:00 AM 12:00 PM
3 12:00AM 11:30AM 8:00AM 6:30AM
4 12:00 AM 2:00 AM 6:00 AM 11:00 AM
;
run;
data want;
set have;
array x{*} $ _:;
array y{4} $ 40;
do i=1 to dim(x);
y{i}=prxchange('s/\s*(am|pm)$/ $1/i',-1,strip(x{i}));
end;
drop i;
run;
proc print;run;
Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.
Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.