BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
France
Quartz | Level 8

Dear all,

 

I read the data by using following codes,

proc format library=work;
invalue twodot
'..' = .
other = [32.16]
;
run;

data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
  input 
        Country_Name :$50. 
        Country_Code :$29. 
        Series_Name :$250. 
        Series_Code :$250.  
        YR1980 :twodot. 
        YR1981 :twodot.
        YR1982 :twodot. 
        YR1983 :twodot.
        YR1984 :twodot.
        YR1985 :twodot. 
        YR1986 :twodot. 
        YR1987 :twodot. 
        YR1991 :twodot.  
        YR1992 :twodot.  
        YR1993 :twodot. 
        YR1994 :twodot.  
        YR1995 :twodot.  
        YR1996 :twodot.  
        YR1997 :twodot.  
        YR1998 :twodot.  
        YR1999 :twodot.  
        YR2000 :twodot.  
        YR2001 :twodot.  
        YR2002 :twodot.  
        YR2006 :twodot.  
        YR2007 :twodot.  
        YR2008 :twodot.   
        YR2009 :twodot.  
        YR2010 :twodot.  
        YR2011 :twodot.   
        YR2012 :twodot.  
        YR2013 :twodot.   
        YR2014 :twodot.  
        YR2015 :twodot.  
        YR2016 :twodot.   
        YR2017 :twodot.   
;

run;

 however, the value in original data as 160599998500 shows in the new table as 0.00001606.

the value in original data as 15537999900 shows in the new table as 1.5538E-6.

Could you please give me some suggestions? 

the attachment is the original data.

 

thanks in advance.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

You should always use truncover instead of missover.

If you try to read 5 bytes but only have three characters in the last variable, missover will discard everything, while truncover gets you the three characters.

 

With sufficiently long numbers, use a proper display format; if you have lots of similar variables, SAS can help you with variable lists:

data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
input 
  Country_Name :$50. 
  Country_Code :$29. 
  Series_Name :$250. 
  Series_Code :$250.  
  (YR1980-YR2017) (:twodot.) 
;
format YR1980-YR2017 32.16;
run;

Note that my next step would be to normalize the data:

proc transpose
  data=sa_step1.gdp
  out=sa_step1.gdp_trans (
    compress=yes
    rename=(col1=gdp)
  )
;
by
  Country_Name notsorted
  Country_Code notsorted
  Series_Name notsorted
  Series_Code notsorted
;
var YR:;
run;

data sa_step1.gdp_final (compress=yes);
set sa_step1.gdp_trans;
year = input(substr(_name_,3),4.);
drop _name_;
run;

I use the compress option because of the long strings contained.

View solution in original post

3 REPLIES 3
Kurt_Bremser
Super User

When using an informat, it is usually not necessary (and in fact causes problems) to specify the fractional part.

See the documentation: https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=leforinforref&docsetTarge...

By using 32.16, you forced SAS to divide all values by 10**16 (unless a decimal point is present in the input data, as that overrides the d in the w.d informat).

Do this instead:

proc format library=work;
invalue twodot
'..' = .
other = [32.]
;
run;

and use the colon modifier when reading, or you'll override the delimiters:

data GDP;
infile datalines DLM=',' DSD truncover ;
input 
  val1 :twodot.
  val2 :twodot.
;
datalines;
..,8013233121.55065
185470260100,..
;
run;
France
Quartz | Level 8

Dear Kurt,

 

thanks for your suggestions, and apologies for missing information。There is a decimal point is present in the input data. By using the following codes

proc format library=work;
invalue twodot
'..' = .
other = [32.]
;
run;

data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
  input 
        Country_Name :$50. 
        Country_Code :$29. 
        Series_Name :$250. 
        Series_Code :$250.  
        YR1980 :twodot. 
        YR1981 :twodot.
        YR1982 :twodot. 
        YR1983 :twodot.
        YR1984 :twodot.
        YR1985 :twodot. 
        YR1986 :twodot. 
        YR1987 :twodot. 
        YR1991 :twodot.  
        YR1992 :twodot.  
        YR1993 :twodot. 
        YR1994 :twodot.  
        YR1995 :twodot.  
        YR1996 :twodot.  
        YR1997 :twodot.  
        YR1998 :twodot.  
        YR1999 :twodot.  
        YR2000 :twodot.  
        YR2001 :twodot.  
        YR2002 :twodot.  
        YR2006 :twodot.  
        YR2007 :twodot.  
        YR2008 :twodot.   
        YR2009 :twodot.  
        YR2010 :twodot.  
        YR2011 :twodot.   
        YR2012 :twodot.  
        YR2013 :twodot.   
        YR2014 :twodot.  
        YR2015 :twodot.  
        YR2016 :twodot.   
        YR2017 :twodot.   
;

run;

however, the value '5.74563529200682,2.94859680156875,1.10493826182693,-1.2515966446984' original data shows in the viewtable is '5.745635292, 2.9485968016, 1.1049382618, -1.251596645'

 

could you please give me more suggestions about it?

 

besides, I use missover rather than turnover, because the length of each line is different. is it ok?

 

thanks in advance.

 

 

 

 

Kurt_Bremser
Super User

You should always use truncover instead of missover.

If you try to read 5 bytes but only have three characters in the last variable, missover will discard everything, while truncover gets you the three characters.

 

With sufficiently long numbers, use a proper display format; if you have lots of similar variables, SAS can help you with variable lists:

data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
input 
  Country_Name :$50. 
  Country_Code :$29. 
  Series_Name :$250. 
  Series_Code :$250.  
  (YR1980-YR2017) (:twodot.) 
;
format YR1980-YR2017 32.16;
run;

Note that my next step would be to normalize the data:

proc transpose
  data=sa_step1.gdp
  out=sa_step1.gdp_trans (
    compress=yes
    rename=(col1=gdp)
  )
;
by
  Country_Name notsorted
  Country_Code notsorted
  Series_Name notsorted
  Series_Code notsorted
;
var YR:;
run;

data sa_step1.gdp_final (compress=yes);
set sa_step1.gdp_trans;
year = input(substr(_name_,3),4.);
drop _name_;
run;

I use the compress option because of the long strings contained.

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 1222 views
  • 0 likes
  • 2 in conversation