Hi Everyone,
I want to import a csv file with 3 columns of the following format: Name, price, time(yyyy-mm-dd hh:mm:sss) and then extract date and time.
One of the problem is price sometime has 'N/A' so I have to use guessingrows in my code.
I wonder if there is anything else can be used since guessingrows might take too much time if I want to guessing=Max.
Thank you,
HHC
Ab,117.9,480751.0,2022-07-19 12:48:00.891313
AX,726.2398,16005050.0,2022-07-19 12:48:00.891313
MI,714.075,N/A,2022-07-19 12:48:00.891313
AXxd,726.2398,16005050.0,2022-07-19 12:48:00.891313
proc import datafile='\Downloads\333_12_44.csv'
out=have replace; guessingrows=3000 ; run;
There is no need to force SAS to GUESS how to read a file with only FOUR variables.
Just write the data step yourself and you will have full control over how the variables are defined and how the N/A strings are handled.
For example you could just use the ?? input modifier to have it ignore the fact that N/A is not a number.
data have ;
infile '\Downloads\333_12_44.csv' dsd truncover firstobs=2;
input char1 :$10. num1 price ?? datetime :anydtdtm. ;
format datetime datetime26.5 ;
run;
There is no need to force SAS to GUESS how to read a file with only FOUR variables.
Just write the data step yourself and you will have full control over how the variables are defined and how the N/A strings are handled.
For example you could just use the ?? input modifier to have it ignore the fact that N/A is not a number.
data have ;
infile '\Downloads\333_12_44.csv' dsd truncover firstobs=2;
input char1 :$10. num1 price ?? datetime :anydtdtm. ;
format datetime datetime26.5 ;
run;
So much thanks, Tom.
HHC
Another approach is to use a custom informat to read known problem values.
proc format; invalue pricena 'N/A' = .N other = [16.] ; run; data have ; infile datalines dlm=',' dsd; input char1 :$10. num1 price :Pricena. datetime :anydtdtm. ; format datetime datetime26.5 ; datalines; Ab,117.9,480751.0,2022-07-19 12:48:00.891313 AX,726.2398,16005050.0,2022-07-19 12:48:00.891313 MI,714.075,N/A,2022-07-19 12:48:00.891313 AXxd,726.2398,16005050.0,2022-07-19 12:48:00.891313 ;
This creates an informat that will read N/A into a special missing value, .N, so that you can tell later that it was read as N/A.
Coupled with the ?? informat modifier your data can tell you some things were handled gracefully and others not.
Consider this example where someone had a data entry error or for some reason entered PDQ instead of price or N/A.
We can tell from the data which missing value was which.
data have2 ; infile datalines dlm=',' dsd; input char1 :$10. num1 price ??:Pricena. datetime :anydtdtm. ; format datetime datetime26.5 ; datalines; Ab,117.9,480751.0,2022-07-19 12:48:00.891313 AX,726.2398,16005050.0,2022-07-19 12:48:00.891313 MI,714.075,N/A,2022-07-19 12:48:00.891313 AXxd,726.2398,PDQ,2022-07-19 12:48:00.891313 ;
The custom informat can be very helpful if you have multiple codes in what should be a numeric field. Consider the possibility of codes with meanings such as: period of record too short, not active yet, not longer valid or similar. Different special missings (up to 27, .A to .Z and ._) can allow you to answer questions about "how many no longer valid products were listed" that a simple missing applied to multiple codes will not. A custom format with the code meanings allows such reports to look nice. The values are still missing for any numeric calculation though such as mean price.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.