Dear all,
I read the data by using following codes,
proc format library=work;
invalue twodot
'..' = .
other = [32.16]
;
run;
data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
input
Country_Name :$50.
Country_Code :$29.
Series_Name :$250.
Series_Code :$250.
YR1980 :twodot.
YR1981 :twodot.
YR1982 :twodot.
YR1983 :twodot.
YR1984 :twodot.
YR1985 :twodot.
YR1986 :twodot.
YR1987 :twodot.
YR1991 :twodot.
YR1992 :twodot.
YR1993 :twodot.
YR1994 :twodot.
YR1995 :twodot.
YR1996 :twodot.
YR1997 :twodot.
YR1998 :twodot.
YR1999 :twodot.
YR2000 :twodot.
YR2001 :twodot.
YR2002 :twodot.
YR2006 :twodot.
YR2007 :twodot.
YR2008 :twodot.
YR2009 :twodot.
YR2010 :twodot.
YR2011 :twodot.
YR2012 :twodot.
YR2013 :twodot.
YR2014 :twodot.
YR2015 :twodot.
YR2016 :twodot.
YR2017 :twodot.
;
run;
however, the value in original data as 160599998500 shows in the new table as 0.00001606.
the value in original data as 15537999900 shows in the new table as 1.5538E-6.
Could you please give me some suggestions?
the attachment is the original data.
thanks in advance.
You should always use truncover instead of missover.
If you try to read 5 bytes but only have three characters in the last variable, missover will discard everything, while truncover gets you the three characters.
With sufficiently long numbers, use a proper display format; if you have lots of similar variables, SAS can help you with variable lists:
data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
input
Country_Name :$50.
Country_Code :$29.
Series_Name :$250.
Series_Code :$250.
(YR1980-YR2017) (:twodot.)
;
format YR1980-YR2017 32.16;
run;
Note that my next step would be to normalize the data:
proc transpose
data=sa_step1.gdp
out=sa_step1.gdp_trans (
compress=yes
rename=(col1=gdp)
)
;
by
Country_Name notsorted
Country_Code notsorted
Series_Name notsorted
Series_Code notsorted
;
var YR:;
run;
data sa_step1.gdp_final (compress=yes);
set sa_step1.gdp_trans;
year = input(substr(_name_,3),4.);
drop _name_;
run;
I use the compress option because of the long strings contained.
When using an informat, it is usually not necessary (and in fact causes problems) to specify the fractional part.
See the documentation: https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=leforinforref&docsetTarge...
By using 32.16, you forced SAS to divide all values by 10**16 (unless a decimal point is present in the input data, as that overrides the d in the w.d informat).
Do this instead:
proc format library=work;
invalue twodot
'..' = .
other = [32.]
;
run;
and use the colon modifier when reading, or you'll override the delimiters:
data GDP;
infile datalines DLM=',' DSD truncover ;
input
val1 :twodot.
val2 :twodot.
;
datalines;
..,8013233121.55065
185470260100,..
;
run;
Dear Kurt,
thanks for your suggestions, and apologies for missing information。There is a decimal point is present in the input data. By using the following codes
proc format library=work;
invalue twodot
'..' = .
other = [32.]
;
run;
data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
input
Country_Name :$50.
Country_Code :$29.
Series_Name :$250.
Series_Code :$250.
YR1980 :twodot.
YR1981 :twodot.
YR1982 :twodot.
YR1983 :twodot.
YR1984 :twodot.
YR1985 :twodot.
YR1986 :twodot.
YR1987 :twodot.
YR1991 :twodot.
YR1992 :twodot.
YR1993 :twodot.
YR1994 :twodot.
YR1995 :twodot.
YR1996 :twodot.
YR1997 :twodot.
YR1998 :twodot.
YR1999 :twodot.
YR2000 :twodot.
YR2001 :twodot.
YR2002 :twodot.
YR2006 :twodot.
YR2007 :twodot.
YR2008 :twodot.
YR2009 :twodot.
YR2010 :twodot.
YR2011 :twodot.
YR2012 :twodot.
YR2013 :twodot.
YR2014 :twodot.
YR2015 :twodot.
YR2016 :twodot.
YR2017 :twodot.
;
run;
however, the value '5.74563529200682,2.94859680156875,1.10493826182693,-1.2515966446984' original data shows in the viewtable is '5.745635292, 2.9485968016, 1.1049382618, -1.251596645'
could you please give me more suggestions about it?
besides, I use missover rather than turnover, because the length of each line is different. is it ok?
thanks in advance.
You should always use truncover instead of missover.
If you try to read 5 bytes but only have three characters in the last variable, missover will discard everything, while truncover gets you the three characters.
With sufficiently long numbers, use a proper display format; if you have lots of similar variables, SAS can help you with variable lists:
data sa_step1.GDP;
infile 'F:\Dataset\Control variables\Source\1.GDP World Development Indicators\eca1f3cd-082d-4f10-a504-2319bf5dbe50_Data.csv' DLM=',' DSD missover lrecl = 32767 firstobs =2 ;
input
Country_Name :$50.
Country_Code :$29.
Series_Name :$250.
Series_Code :$250.
(YR1980-YR2017) (:twodot.)
;
format YR1980-YR2017 32.16;
run;
Note that my next step would be to normalize the data:
proc transpose
data=sa_step1.gdp
out=sa_step1.gdp_trans (
compress=yes
rename=(col1=gdp)
)
;
by
Country_Name notsorted
Country_Code notsorted
Series_Name notsorted
Series_Code notsorted
;
var YR:;
run;
data sa_step1.gdp_final (compress=yes);
set sa_step1.gdp_trans;
year = input(substr(_name_,3),4.);
drop _name_;
run;
I use the compress option because of the long strings contained.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.