Solved: Re: problem in read numeric value

France · Posted 10-01-2018 08:07 AM

Dear all,

I read the data by using following codes,

proc format library=work;
invalue twodot
'..' = .
other = [32.16]
;
run;

data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
  input 
        Country_Name :$50. 
        Country_Code :$29. 
        Series_Name :$250. 
        Series_Code :$250.  
        YR1980 :twodot. 
        YR1981 :twodot.
        YR1982 :twodot. 
        YR1983 :twodot.
        YR1984 :twodot.
        YR1985 :twodot. 
        YR1986 :twodot. 
        YR1987 :twodot. 
        YR1991 :twodot.  
        YR1992 :twodot.  
        YR1993 :twodot. 
        YR1994 :twodot.  
        YR1995 :twodot.  
        YR1996 :twodot.  
        YR1997 :twodot.  
        YR1998 :twodot.  
        YR1999 :twodot.  
        YR2000 :twodot.  
        YR2001 :twodot.  
        YR2002 :twodot.  
        YR2006 :twodot.  
        YR2007 :twodot.  
        YR2008 :twodot.   
        YR2009 :twodot.  
        YR2010 :twodot.  
        YR2011 :twodot.   
        YR2012 :twodot.  
        YR2013 :twodot.   
        YR2014 :twodot.  
        YR2015 :twodot.  
        YR2016 :twodot.   
        YR2017 :twodot.   
;

run;

however, the value in original data as 160599998500 shows in the new table as 0.00001606.

the value in original data as 15537999900 shows in the new table as 1.5538E-6.

Could you please give me some suggestions?

the attachment is the original data.

thanks in advance.

Kurt_Bremser · Posted 10-01-2018 09:48 AM

You should always use truncover instead of missover.

If you try to read 5 bytes but only have three characters in the last variable, missover will discard everything, while truncover gets you the three characters.

With sufficiently long numbers, use a proper display format; if you have lots of similar variables, SAS can help you with variable lists:

data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
input 
  Country_Name :$50. 
  Country_Code :$29. 
  Series_Name :$250. 
  Series_Code :$250.  
  (YR1980-YR2017) (:twodot.) 
;
format YR1980-YR2017 32.16;
run;

Note that my next step would be to normalize the data:

proc transpose
  data=sa_step1.gdp
  out=sa_step1.gdp_trans (
    compress=yes
    rename=(col1=gdp)
  )
;
by
  Country_Name notsorted
  Country_Code notsorted
  Series_Name notsorted
  Series_Code notsorted
;
var YR:;
run;

data sa_step1.gdp_final (compress=yes);
set sa_step1.gdp_trans;
year = input(substr(_name_,3),4.);
drop _name_;
run;

I use the compress option because of the long strings contained.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 10-01-2018 08:32 AM

When using an informat, it is usually not necessary (and in fact causes problems) to specify the fractional part.

See the documentation: https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=leforinforref&docsetTarge...

By using 32.16, you forced SAS to divide all values by 10**16 (unless a decimal point is present in the input data, as that overrides the d in the w.d informat).

Do this instead:

proc format library=work;
invalue twodot
'..' = .
other = [32.]
;
run;

and use the colon modifier when reading, or you'll override the delimiters:

data GDP;
infile datalines DLM=',' DSD truncover ;
input 
  val1 :twodot.
  val2 :twodot.
;
datalines;
..,8013233121.55065
185470260100,..
;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

France · Posted 10-01-2018 09:28 AM

Dear Kurt,

thanks for your suggestions, and apologies for missing information。There is a decimal point is present in the input data. By using the following codes

proc format library=work;
invalue twodot
'..' = .
other = [32.]
;
run;

data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
  input 
        Country_Name :$50. 
        Country_Code :$29. 
        Series_Name :$250. 
        Series_Code :$250.  
        YR1980 :twodot. 
        YR1981 :twodot.
        YR1982 :twodot. 
        YR1983 :twodot.
        YR1984 :twodot.
        YR1985 :twodot. 
        YR1986 :twodot. 
        YR1987 :twodot. 
        YR1991 :twodot.  
        YR1992 :twodot.  
        YR1993 :twodot. 
        YR1994 :twodot.  
        YR1995 :twodot.  
        YR1996 :twodot.  
        YR1997 :twodot.  
        YR1998 :twodot.  
        YR1999 :twodot.  
        YR2000 :twodot.  
        YR2001 :twodot.  
        YR2002 :twodot.  
        YR2006 :twodot.  
        YR2007 :twodot.  
        YR2008 :twodot.   
        YR2009 :twodot.  
        YR2010 :twodot.  
        YR2011 :twodot.   
        YR2012 :twodot.  
        YR2013 :twodot.   
        YR2014 :twodot.  
        YR2015 :twodot.  
        YR2016 :twodot.   
        YR2017 :twodot.   
;

run;

however, the value '5.74563529200682,2.94859680156875,1.10493826182693,-1.2515966446984' original data shows in the viewtable is '5.745635292, 2.9485968016, 1.1049382618, -1.251596645'

could you please give me more suggestions about it?

besides, I use missover rather than turnover, because the length of each line is different. is it ok?

thanks in advance.

Kurt_Bremser · Posted 10-01-2018 09:48 AM

You should always use truncover instead of missover.

If you try to read 5 bytes but only have three characters in the last variable, missover will discard everything, while truncover gets you the three characters.

With sufficiently long numbers, use a proper display format; if you have lots of similar variables, SAS can help you with variable lists:

data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
input 
  Country_Name :$50. 
  Country_Code :$29. 
  Series_Name :$250. 
  Series_Code :$250.  
  (YR1980-YR2017) (:twodot.) 
;
format YR1980-YR2017 32.16;
run;

Note that my next step would be to normalize the data:

proc transpose
  data=sa_step1.gdp
  out=sa_step1.gdp_trans (
    compress=yes
    rename=(col1=gdp)
  )
;
by
  Country_Name notsorted
  Country_Code notsorted
  Series_Name notsorted
  Series_Code notsorted
;
var YR:;
run;

data sa_step1.gdp_final (compress=yes);
set sa_step1.gdp_trans;
year = input(substr(_name_,3),4.);
drop _name_;
run;

I use the compress option because of the long strings contained.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

problem in read numeric value

Re: problem in read numeric value

Re: problem in read numeric value

Re: problem in read numeric value

Re: problem in read numeric value

Registration is open

SAS Training: Just a Click Away