BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
_CJY_
Fluorite | Level 6

Hello everyone,

 

I have a list of 5 CSV files that do not have uniform formats for its time variables (deptTime and AdmitTime). Some CSV files have both of them in numeric, some both in character, and others are mixed. It becomes an issue when I need to merge all of the 5 into one data set. This seems to be on going issue with no rhyme or reason as to which of the two vars will be in character or numeric format. In order to avoid this issue in the future, I have tried to create a macro to accomplish two things:

 

1. Identify whether the time variable is character or numeric (Using the VTYPE function for this)

2. If character, then convert to numeric time

 

The end result shows that all of the time variables have been converted to a numeric format (time20.3), but have the all of the values are missing. Any help or suggestions would be greatly appreciated! 

 

Here is the Macro that I have tried:

 

%macro timewarp;

data &file._;
set &file (rename=(deptchecktime=deptwarp admitTime=admitwarp));
admit=vtype(admitwarp);
dept=vtype(deptwarp);
if admit="C" then do;
admitTime=input(admitwarp,time20.3);
format admitTime time20.3;
end;
if dept="C" then do;
deptchecktime=input(deptwarp,time20.3);
format deptchecktime time20.3;
end;

%mend;

Thank you in advance,

Carlos 

 

SAS V9.4

 

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

@_CJY_ wrote:

Hello everyone,

 

I have a list of 5 CSV files that do not have uniform formats for its time variables (deptTime and AdmitTime). Some CSV files have both of them in numeric, some both in character, and others are mixed. It becomes an issue when I need to merge all of the 5 into one data set. This seems to be on going issue with no rhyme or reason as to which of the two vars will be in character or numeric format. In order to avoid this issue in the future, I have tried to create a macro to accomplish two things:

 

 


Are you using proc import to read these? Are they supposed to be the same file layout? Then likely you should be using a datastep to read the external file. Proc Import guesses each and every time a file is read. So lengths of variables and types can change depending on the actual content of the first few rows of data.

 

In a data step you can specify a specific informat to read the data. If the content changes you may be able to use the Anydtdte, Anydttme or Anydtdtm informats though I would carefully examine the results as some assumptions are made when attempting to read a (garbage format) date such as 101112 , which might be 10 Nov 2012, 10 Nov 1912, 11 Oct 2012, 11 Oct 1912, 12 Nov 2010 or 12 Nov 1910. 

 

You might also ask the data source why the content keeps changing. Maybe someone else needs to clean up a process.

 

If your CSV files are coming from a spreadsheet you might also try setting cell properties for the date or time columns to a single standard value type before exporting.

 

There are also some issues around the actual length of your character variable and informat used. If your character value has leading spaces the informat may read incorrectly

 

For more specific help it might help to show the actual values of the variables.

 

BTW there isn't any need to add new variables just to check the type. You can use

if vtype(admitwarp)="C" then admitTime=input(admitwarp,time20.3);

 

There is also no reason to include the FORMAT statement in a do block as FORMAT is not executable and is applied wherever it occurs in the code.

 

View solution in original post

8 REPLIES 8
Reeza
Super User
Are the five files the same structure and variable names? How did you import them? The correct answer here is go back to the import step and import them as the correct type (numeric/character) and format.
_CJY_
Fluorite | Level 6

Hi Reeza,

 

Yes all of the files have the same structure and variable names. I used a simple PROC IMPORT to upload each one (below). 

 

proc import
		datafile="pathname\file.csv"
		dbms=csv replace
		out=file_work;
		guessingrows=900;
	run;

What else would I need to add to import them as the correct format?

Reeza
Super User

PROC IMPORT does not allow you to specify data types and formats. But it will generate the skeleton code for you. Run it once for a file, copy the code from the log. Make sure the types and formats are correct there. Then use that same code to read your other files, this means they'll all have the same structure. 

 

You can add GUESSINGROWS=MAX to your PROC IMPORT as well, it will be slower but will do a better job at guessing. 

 


@_CJY_ wrote:

Hi Reeza,

 

Yes all of the files have the same structure and variable names. I used a simple PROC IMPORT to upload each one (below). 

 

proc import
		datafile="pathname\file.csv"
		dbms=csv replace
		out=file_work;
		guessingrows=900;
	run;

What else would I need to add to import them as the correct format?


 

PaigeMiller
Diamond | Level 26

We would need to have access to a portion of your data to determine why you are getting missing values.

 

Please provide a portion of the data as a SAS data step

https://communities.sas.com/t5/SAS-Communities-Library/How-to-create-a-data-step-version-of-your-dat...

--
Paige Miller
_CJY_
Fluorite | Level 6
data WORK.FORUM_SAMPLE;
  infile datalines dsd truncover;
  input deptCheckTime:TIME20.3 admitTime:$5.;
  format deptCheckTime TIME20.3;
datalines;
12:09:00.000 10:46
16:56:00.000 17:37
13:31:00.000 1:45
13:33:00.000 13:49
11:50:00.000 21:02
13:41:00.000 23:10
10:50:00.000 14:59
14:41:00.000 10:51
14:41:00.000 10:51
12:35:00.000 22:45
;;;;

Thank you for the code to make this easier to understand. I included a small sample where there is a mix of both character and numeric.

ballardw
Super User

@_CJY_ wrote:

Hello everyone,

 

I have a list of 5 CSV files that do not have uniform formats for its time variables (deptTime and AdmitTime). Some CSV files have both of them in numeric, some both in character, and others are mixed. It becomes an issue when I need to merge all of the 5 into one data set. This seems to be on going issue with no rhyme or reason as to which of the two vars will be in character or numeric format. In order to avoid this issue in the future, I have tried to create a macro to accomplish two things:

 

 


Are you using proc import to read these? Are they supposed to be the same file layout? Then likely you should be using a datastep to read the external file. Proc Import guesses each and every time a file is read. So lengths of variables and types can change depending on the actual content of the first few rows of data.

 

In a data step you can specify a specific informat to read the data. If the content changes you may be able to use the Anydtdte, Anydttme or Anydtdtm informats though I would carefully examine the results as some assumptions are made when attempting to read a (garbage format) date such as 101112 , which might be 10 Nov 2012, 10 Nov 1912, 11 Oct 2012, 11 Oct 1912, 12 Nov 2010 or 12 Nov 1910. 

 

You might also ask the data source why the content keeps changing. Maybe someone else needs to clean up a process.

 

If your CSV files are coming from a spreadsheet you might also try setting cell properties for the date or time columns to a single standard value type before exporting.

 

There are also some issues around the actual length of your character variable and informat used. If your character value has leading spaces the informat may read incorrectly

 

For more specific help it might help to show the actual values of the variables.

 

BTW there isn't any need to add new variables just to check the type. You can use

if vtype(admitwarp)="C" then admitTime=input(admitwarp,time20.3);

 

There is also no reason to include the FORMAT statement in a do block as FORMAT is not executable and is applied wherever it occurs in the code.

 

_CJY_
Fluorite | Level 6

Balladrw,

 

Thank you so much for your suggestions. All of it was extremely useful. Did not realize FORMAT function would not work inside a do block.

 

data &file._;
		set &file (rename=(deptchecktime=deptwarp admitTime=admitwarp));
	 	if vtype(admitwarp)="C" then admitTime=input(admitwarp,time20.3);
		format admitTime time20.3;
		if vtype(deptwarp)= "C" then deptchecktime=input(deptwarp,time20.3);
	 	format deptcheckTime time20.3;
	drop deptwarp admitwarp;
	run;

 

ballardw
Super User

@_CJY_ wrote:

Balladrw,

 

Thank you so much for your suggestions. All of it was extremely useful. Did not realize FORMAT function would not work inside a do block.

 

data &file._;
		set &file (rename=(deptchecktime=deptwarp admitTime=admitwarp));
	 	if vtype(admitwarp)="C" then admitTime=input(admitwarp,time20.3);
		format admitTime time20.3;
		if vtype(deptwarp)= "C" then deptchecktime=input(deptwarp,time20.3);
	 	format deptcheckTime time20.3;
	drop deptwarp admitwarp;
	run;

 


Format works just fine. The location in the code doesn't matter unless you have two format statements for the same variable in which case the last one encountered is going to apply. See below for brief example.

data work.example;
   format x best4.;
   do;
      x = 1.234;
      format x percent8.3;
   end;
run;
data work.example2;
   format x percent8.3;
   do;
      x = 1.234;
   end;
   format x best4.;
run;

There are several non-executable statements such as Format, Informat, Attrib, and Label.

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 8 replies
  • 1761 views
  • 4 likes
  • 4 in conversation