Hello everyone,
I have a list of 5 CSV files that do not have uniform formats for its time variables (deptTime and AdmitTime). Some CSV files have both of them in numeric, some both in character, and others are mixed. It becomes an issue when I need to merge all of the 5 into one data set. This seems to be on going issue with no rhyme or reason as to which of the two vars will be in character or numeric format. In order to avoid this issue in the future, I have tried to create a macro to accomplish two things:
1. Identify whether the time variable is character or numeric (Using the VTYPE function for this)
2. If character, then convert to numeric time
The end result shows that all of the time variables have been converted to a numeric format (time20.3), but have the all of the values are missing. Any help or suggestions would be greatly appreciated!
Here is the Macro that I have tried:
%macro timewarp;
data &file._;
set &file (rename=(deptchecktime=deptwarp admitTime=admitwarp));
admit=vtype(admitwarp);
dept=vtype(deptwarp);
if admit="C" then do;
admitTime=input(admitwarp,time20.3);
format admitTime time20.3;
end;
if dept="C" then do;
deptchecktime=input(deptwarp,time20.3);
format deptchecktime time20.3;
end;
%mend;
Thank you in advance,
Carlos
SAS V9.4
@_CJY_ wrote:
Hello everyone,
I have a list of 5 CSV files that do not have uniform formats for its time variables (deptTime and AdmitTime). Some CSV files have both of them in numeric, some both in character, and others are mixed. It becomes an issue when I need to merge all of the 5 into one data set. This seems to be on going issue with no rhyme or reason as to which of the two vars will be in character or numeric format. In order to avoid this issue in the future, I have tried to create a macro to accomplish two things:
Are you using proc import to read these? Are they supposed to be the same file layout? Then likely you should be using a datastep to read the external file. Proc Import guesses each and every time a file is read. So lengths of variables and types can change depending on the actual content of the first few rows of data.
In a data step you can specify a specific informat to read the data. If the content changes you may be able to use the Anydtdte, Anydttme or Anydtdtm informats though I would carefully examine the results as some assumptions are made when attempting to read a (garbage format) date such as 101112 , which might be 10 Nov 2012, 10 Nov 1912, 11 Oct 2012, 11 Oct 1912, 12 Nov 2010 or 12 Nov 1910.
You might also ask the data source why the content keeps changing. Maybe someone else needs to clean up a process.
If your CSV files are coming from a spreadsheet you might also try setting cell properties for the date or time columns to a single standard value type before exporting.
There are also some issues around the actual length of your character variable and informat used. If your character value has leading spaces the informat may read incorrectly
For more specific help it might help to show the actual values of the variables.
BTW there isn't any need to add new variables just to check the type. You can use
if vtype(admitwarp)="C" then admitTime=input(admitwarp,time20.3);
There is also no reason to include the FORMAT statement in a do block as FORMAT is not executable and is applied wherever it occurs in the code.
Hi Reeza,
Yes all of the files have the same structure and variable names. I used a simple PROC IMPORT to upload each one (below).
proc import
datafile="pathname\file.csv"
dbms=csv replace
out=file_work;
guessingrows=900;
run;
What else would I need to add to import them as the correct format?
PROC IMPORT does not allow you to specify data types and formats. But it will generate the skeleton code for you. Run it once for a file, copy the code from the log. Make sure the types and formats are correct there. Then use that same code to read your other files, this means they'll all have the same structure.
You can add GUESSINGROWS=MAX to your PROC IMPORT as well, it will be slower but will do a better job at guessing.
@_CJY_ wrote:
Hi Reeza,
Yes all of the files have the same structure and variable names. I used a simple PROC IMPORT to upload each one (below).
proc import datafile="pathname\file.csv" dbms=csv replace out=file_work; guessingrows=900; run;
What else would I need to add to import them as the correct format?
We would need to have access to a portion of your data to determine why you are getting missing values.
Please provide a portion of the data as a SAS data step
data WORK.FORUM_SAMPLE;
infile datalines dsd truncover;
input deptCheckTime:TIME20.3 admitTime:$5.;
format deptCheckTime TIME20.3;
datalines;
12:09:00.000 10:46
16:56:00.000 17:37
13:31:00.000 1:45
13:33:00.000 13:49
11:50:00.000 21:02
13:41:00.000 23:10
10:50:00.000 14:59
14:41:00.000 10:51
14:41:00.000 10:51
12:35:00.000 22:45
;;;;
Thank you for the code to make this easier to understand. I included a small sample where there is a mix of both character and numeric.
@_CJY_ wrote:
Hello everyone,
I have a list of 5 CSV files that do not have uniform formats for its time variables (deptTime and AdmitTime). Some CSV files have both of them in numeric, some both in character, and others are mixed. It becomes an issue when I need to merge all of the 5 into one data set. This seems to be on going issue with no rhyme or reason as to which of the two vars will be in character or numeric format. In order to avoid this issue in the future, I have tried to create a macro to accomplish two things:
Are you using proc import to read these? Are they supposed to be the same file layout? Then likely you should be using a datastep to read the external file. Proc Import guesses each and every time a file is read. So lengths of variables and types can change depending on the actual content of the first few rows of data.
In a data step you can specify a specific informat to read the data. If the content changes you may be able to use the Anydtdte, Anydttme or Anydtdtm informats though I would carefully examine the results as some assumptions are made when attempting to read a (garbage format) date such as 101112 , which might be 10 Nov 2012, 10 Nov 1912, 11 Oct 2012, 11 Oct 1912, 12 Nov 2010 or 12 Nov 1910.
You might also ask the data source why the content keeps changing. Maybe someone else needs to clean up a process.
If your CSV files are coming from a spreadsheet you might also try setting cell properties for the date or time columns to a single standard value type before exporting.
There are also some issues around the actual length of your character variable and informat used. If your character value has leading spaces the informat may read incorrectly
For more specific help it might help to show the actual values of the variables.
BTW there isn't any need to add new variables just to check the type. You can use
if vtype(admitwarp)="C" then admitTime=input(admitwarp,time20.3);
There is also no reason to include the FORMAT statement in a do block as FORMAT is not executable and is applied wherever it occurs in the code.
Balladrw,
Thank you so much for your suggestions. All of it was extremely useful. Did not realize FORMAT function would not work inside a do block.
data &file._;
set &file (rename=(deptchecktime=deptwarp admitTime=admitwarp));
if vtype(admitwarp)="C" then admitTime=input(admitwarp,time20.3);
format admitTime time20.3;
if vtype(deptwarp)= "C" then deptchecktime=input(deptwarp,time20.3);
format deptcheckTime time20.3;
drop deptwarp admitwarp;
run;
@_CJY_ wrote:
Balladrw,
Thank you so much for your suggestions. All of it was extremely useful. Did not realize FORMAT function would not work inside a do block.
data &file._; set &file (rename=(deptchecktime=deptwarp admitTime=admitwarp)); if vtype(admitwarp)="C" then admitTime=input(admitwarp,time20.3); format admitTime time20.3; if vtype(deptwarp)= "C" then deptchecktime=input(deptwarp,time20.3); format deptcheckTime time20.3; drop deptwarp admitwarp; run;
Format works just fine. The location in the code doesn't matter unless you have two format statements for the same variable in which case the last one encountered is going to apply. See below for brief example.
data work.example; format x best4.; do; x = 1.234; format x percent8.3; end; run; data work.example2; format x percent8.3; do; x = 1.234; end; format x best4.; run;
There are several non-executable statements such as Format, Informat, Attrib, and Label.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.