12-09-2016 11:42 AM
So, I'm working with some data that contains decimals and as usual I can't get past a bug that is screwing up my data.
The decimals are very small but when I go to read them in, any whole numbers get screwed up.
infile 'education_expenditure_supplementary_data.txt' MISSOVER DSD DLM='09'x firstobs=2;
input Country : $40.
institute_type : $40.
direct_expenditure_type : $8.
a1995data : 3.1
a2000data : 3.1
a2005data : 3.1
a2009data : 3.1
a2010data : 3.1
a2011data : 3.1;
Raw Data: 4.9 4.9 5 5.4 5.4 5.3
Output: 4.9 4.9 0.5 5.4 5.4 5.3
Basically a whole number simply gets turned into a smaller decimal, (5 changes to 0.5, 4 changes to 0.4 etc)
Is this an error in my informat or what?
12-09-2016 11:51 AM
Works as designed, when using the informat 3.1 any number that does not have a decimal point, will be divided by 10.
So simply use the informat 8.0.
What you see comes from a time where one would save space by storing cents instead of dollars, so one could save one byte. The 12.2 informat would then divide the number by 100 and make dollars from the cents.
12-09-2016 12:51 PM
Not a bug but a hangover from old processes involving implied decimals where input data sets did not actually have the decimals to save storage space.
Also informats as part of an input statement behave a bit differently than in an informat. Please see this as an alternate:
data Expenses; informat Country $40. institute_type $40. direct_expenditure_type $8. a1995data 3.1 a2000data 3.1 a2005data 3.1 a2009data 3.1 a2010data 3.1 a2011data 3.1; input Country institute_type direct_expenditure_type a1995data a2000data a2005data a2009data a2010data a2011data ; datalines; Something something type 4.9 4.9 5 5.4 5.4 5.3 ; run;