BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
France
Quartz | Level 8

Dear all,

 

I read the data by using following codes,

proc format library=work;
invalue twodot
'..' = .
other = [32.16]
;
run;

data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
  input 
        Country_Name :$50. 
        Country_Code :$29. 
        Series_Name :$250. 
        Series_Code :$250.  
        YR1980 :twodot. 
        YR1981 :twodot.
        YR1982 :twodot. 
        YR1983 :twodot.
        YR1984 :twodot.
        YR1985 :twodot. 
        YR1986 :twodot. 
        YR1987 :twodot. 
        YR1991 :twodot.  
        YR1992 :twodot.  
        YR1993 :twodot. 
        YR1994 :twodot.  
        YR1995 :twodot.  
        YR1996 :twodot.  
        YR1997 :twodot.  
        YR1998 :twodot.  
        YR1999 :twodot.  
        YR2000 :twodot.  
        YR2001 :twodot.  
        YR2002 :twodot.  
        YR2006 :twodot.  
        YR2007 :twodot.  
        YR2008 :twodot.   
        YR2009 :twodot.  
        YR2010 :twodot.  
        YR2011 :twodot.   
        YR2012 :twodot.  
        YR2013 :twodot.   
        YR2014 :twodot.  
        YR2015 :twodot.  
        YR2016 :twodot.   
        YR2017 :twodot.   
;

run;

 however, the value in original data as 160599998500 shows in the new table as 0.00001606.

the value in original data as 15537999900 shows in the new table as 1.5538E-6.

Could you please give me some suggestions? 

the attachment is the original data.

 

thanks in advance.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

You should always use truncover instead of missover.

If you try to read 5 bytes but only have three characters in the last variable, missover will discard everything, while truncover gets you the three characters.

 

With sufficiently long numbers, use a proper display format; if you have lots of similar variables, SAS can help you with variable lists:

data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
input 
  Country_Name :$50. 
  Country_Code :$29. 
  Series_Name :$250. 
  Series_Code :$250.  
  (YR1980-YR2017) (:twodot.) 
;
format YR1980-YR2017 32.16;
run;

Note that my next step would be to normalize the data:

proc transpose
  data=sa_step1.gdp
  out=sa_step1.gdp_trans (
    compress=yes
    rename=(col1=gdp)
  )
;
by
  Country_Name notsorted
  Country_Code notsorted
  Series_Name notsorted
  Series_Code notsorted
;
var YR:;
run;

data sa_step1.gdp_final (compress=yes);
set sa_step1.gdp_trans;
year = input(substr(_name_,3),4.);
drop _name_;
run;

I use the compress option because of the long strings contained.

View solution in original post

3 REPLIES 3
Kurt_Bremser
Super User

When using an informat, it is usually not necessary (and in fact causes problems) to specify the fractional part.

See the documentation: https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=leforinforref&docsetTarge...

By using 32.16, you forced SAS to divide all values by 10**16 (unless a decimal point is present in the input data, as that overrides the d in the w.d informat).

Do this instead:

proc format library=work;
invalue twodot
'..' = .
other = [32.]
;
run;

and use the colon modifier when reading, or you'll override the delimiters:

data GDP;
infile datalines DLM=',' DSD truncover ;
input 
  val1 :twodot.
  val2 :twodot.
;
datalines;
..,8013233121.55065
185470260100,..
;
run;
France
Quartz | Level 8

Dear Kurt,

 

thanks for your suggestions, and apologies for missing information。There is a decimal point is present in the input data. By using the following codes

proc format library=work;
invalue twodot
'..' = .
other = [32.]
;
run;

data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
  input 
        Country_Name :$50. 
        Country_Code :$29. 
        Series_Name :$250. 
        Series_Code :$250.  
        YR1980 :twodot. 
        YR1981 :twodot.
        YR1982 :twodot. 
        YR1983 :twodot.
        YR1984 :twodot.
        YR1985 :twodot. 
        YR1986 :twodot. 
        YR1987 :twodot. 
        YR1991 :twodot.  
        YR1992 :twodot.  
        YR1993 :twodot. 
        YR1994 :twodot.  
        YR1995 :twodot.  
        YR1996 :twodot.  
        YR1997 :twodot.  
        YR1998 :twodot.  
        YR1999 :twodot.  
        YR2000 :twodot.  
        YR2001 :twodot.  
        YR2002 :twodot.  
        YR2006 :twodot.  
        YR2007 :twodot.  
        YR2008 :twodot.   
        YR2009 :twodot.  
        YR2010 :twodot.  
        YR2011 :twodot.   
        YR2012 :twodot.  
        YR2013 :twodot.   
        YR2014 :twodot.  
        YR2015 :twodot.  
        YR2016 :twodot.   
        YR2017 :twodot.   
;

run;

however, the value '5.74563529200682,2.94859680156875,1.10493826182693,-1.2515966446984' original data shows in the viewtable is '5.745635292, 2.9485968016, 1.1049382618, -1.251596645'

 

could you please give me more suggestions about it?

 

besides, I use missover rather than turnover, because the length of each line is different. is it ok?

 

thanks in advance.

 

 

 

 

Kurt_Bremser
Super User

You should always use truncover instead of missover.

If you try to read 5 bytes but only have three characters in the last variable, missover will discard everything, while truncover gets you the three characters.

 

With sufficiently long numbers, use a proper display format; if you have lots of similar variables, SAS can help you with variable lists:

data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
input 
  Country_Name :$50. 
  Country_Code :$29. 
  Series_Name :$250. 
  Series_Code :$250.  
  (YR1980-YR2017) (:twodot.) 
;
format YR1980-YR2017 32.16;
run;

Note that my next step would be to normalize the data:

proc transpose
  data=sa_step1.gdp
  out=sa_step1.gdp_trans (
    compress=yes
    rename=(col1=gdp)
  )
;
by
  Country_Name notsorted
  Country_Code notsorted
  Series_Name notsorted
  Series_Code notsorted
;
var YR:;
run;

data sa_step1.gdp_final (compress=yes);
set sa_step1.gdp_trans;
year = input(substr(_name_,3),4.);
drop _name_;
run;

I use the compress option because of the long strings contained.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 618 views
  • 0 likes
  • 2 in conversation