Hi guys, I want to create a dataset with the datalines statement that includes Canadian bankruptcy data by NAICS (straight from https://www.ic.gc.ca/eic/site/bsf-osb.nsf/eng/br02234.html).
My datastep looks a little like this:
data industry_BK;
length Industry $15 NAICS $2;
input Industry $ NAICS $ BK_rate 3.1 Ins_Rate 3.1;
datalines;
Agriculture 11 0.2 0.3
Mining 21 1.3 2.1
Utilities 22 1.2 1.2
Construction 23 1.3 1.8
;
quit;
However, when I run this, I get the following:
It appears to grab the first digit, but loses everything after the decimal.
I've read through a few documentation pages, but clearly I'm just not getting it. Can someone help me to understand why this doesn't work and how I can fix it?
Many thanks!
Hi,
try this, I used the format best12. for the numeric variables.
data industry_BK;
length Industry $15 NAICS $2;
format BK_rate best12. Ins_Rate best12.;
input Industry $ NAICS $ BK_rate Ins_Rate;
datalines;
Agriculture 11 0.2 0.3
Mining 21 1.3 2.1
Utilities 22 1.2 1.2
Construction 23 1.3 1.8
;
quit;Hi,
try this, I used the format best12. for the numeric variables.
data industry_BK;
length Industry $15 NAICS $2;
format BK_rate best12. Ins_Rate best12.;
input Industry $ NAICS $ BK_rate Ins_Rate;
datalines;
Agriculture 11 0.2 0.3
Mining 21 1.3 2.1
Utilities 22 1.2 1.2
Construction 23 1.3 1.8
;
quit;Try this. It should work:
data industry_BK;
length Industry $15 NAICS $2;
input Industry $ NAICS $ BK_rate Ins_Rate 3.1;
datalines;
Agriculture 11 0.2 0.3 
Mining 21 1.3 2.1 
Utilities 22 1.2 1.2 
Construction 23 1.3 1.8 
;
quit;Both of those approaches did the trick, thanks very much both of you.
Any ideas why specifying "3.1" after the input statement on BK_rate seems to ruin things?
If you specify a format without the colon modifier, the length of the format overrides the delimiters. So the second 3.1 format starts to read immediately after the first, and reads a blank, a digit, and the period.
DO NOT use formatted input with delimited data. You do not need to specify ANY informats in your INPUT statement for that data as SAS already knows how to read numbers, but if you do add informat specificatons then you must proceed them with the colon modifier. That will prevent SAS from reading EXACTLY that many characters and messing up how the delimiter processing works.
Also DO NOT add a decimal part to an informat unless you know that the decimal points were purposely removed from the raw text values. Otherwise any values that do not contain a period will be divided by the specified power of ten.
Thanks Kurt and Tom for the detailed answers!
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
