Hi Everyone,
I want to import a csv file with 3 columns of the following format: Name, price, time(yyyy-mm-dd hh:mm:sss) and then extract date and time.
One of the problem is price sometime has 'N/A' so I have to use guessingrows in my code.
I wonder if there is anything else can be used since guessingrows might take too much time if I want to guessing=Max.
Thank you,
HHC
Ab,117.9,480751.0,2022-07-19 12:48:00.891313
AX,726.2398,16005050.0,2022-07-19 12:48:00.891313
MI,714.075,N/A,2022-07-19 12:48:00.891313
AXxd,726.2398,16005050.0,2022-07-19 12:48:00.891313
proc import datafile='\Downloads\333_12_44.csv'
out=have replace; guessingrows=3000 ; run;
There is no need to force SAS to GUESS how to read a file with only FOUR variables.
Just write the data step yourself and you will have full control over how the variables are defined and how the N/A strings are handled.
For example you could just use the ?? input modifier to have it ignore the fact that N/A is not a number.
data have ;
infile '\Downloads\333_12_44.csv' dsd truncover firstobs=2;
input char1 :$10. num1 price ?? datetime :anydtdtm. ;
format datetime datetime26.5 ;
run;
There is no need to force SAS to GUESS how to read a file with only FOUR variables.
Just write the data step yourself and you will have full control over how the variables are defined and how the N/A strings are handled.
For example you could just use the ?? input modifier to have it ignore the fact that N/A is not a number.
data have ;
infile '\Downloads\333_12_44.csv' dsd truncover firstobs=2;
input char1 :$10. num1 price ?? datetime :anydtdtm. ;
format datetime datetime26.5 ;
run;
So much thanks, Tom.
HHC
Another approach is to use a custom informat to read known problem values.
proc format; invalue pricena 'N/A' = .N other = [16.] ; run; data have ; infile datalines dlm=',' dsd; input char1 :$10. num1 price :Pricena. datetime :anydtdtm. ; format datetime datetime26.5 ; datalines; Ab,117.9,480751.0,2022-07-19 12:48:00.891313 AX,726.2398,16005050.0,2022-07-19 12:48:00.891313 MI,714.075,N/A,2022-07-19 12:48:00.891313 AXxd,726.2398,16005050.0,2022-07-19 12:48:00.891313 ;
This creates an informat that will read N/A into a special missing value, .N, so that you can tell later that it was read as N/A.
Coupled with the ?? informat modifier your data can tell you some things were handled gracefully and others not.
Consider this example where someone had a data entry error or for some reason entered PDQ instead of price or N/A.
We can tell from the data which missing value was which.
data have2 ; infile datalines dlm=',' dsd; input char1 :$10. num1 price ??:Pricena. datetime :anydtdtm. ; format datetime datetime26.5 ; datalines; Ab,117.9,480751.0,2022-07-19 12:48:00.891313 AX,726.2398,16005050.0,2022-07-19 12:48:00.891313 MI,714.075,N/A,2022-07-19 12:48:00.891313 AXxd,726.2398,PDQ,2022-07-19 12:48:00.891313 ;
The custom informat can be very helpful if you have multiple codes in what should be a numeric field. Consider the possibility of codes with meanings such as: period of record too short, not active yet, not longer valid or similar. Different special missings (up to 27, .A to .Z and ._) can allow you to answer questions about "how many no longer valid products were listed" that a simple missing applied to multiple codes will not. A custom format with the code meanings allows such reports to look nice. The values are still missing for any numeric calculation though such as mean price.
April 27 – 30 | Gaylord Texan | Grapevine, Texas
Walk in ready to learn. Walk out ready to deliver. This is the data and AI conference you can't afford to miss.
Register now and lock in 2025 pricing—just $495!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.