Hey,
I was trying to import a CSV file with CPI data from World Bank, from here: https://data.worldbank.org/indicator/FP.CPI.TOTL.ZG
and importing a file named 'API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv'.
The problem I reckon is that there are few delimiters, including space, tab and comma.
Whatever I specify in a delimiter option I never get the expecting results. For instance:
proc import datafile="/home/sasuser.v94/pliki/API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv"
out=inflation_wb
dbms=dlm
replace;
delimiter=' ,"';
getnames=no;
run;
I will recieve an output with 74 columns and no data.
When I specify delimiter=' ,'; then I will get the ouput with all data packed in one column.
Anyone knows by any chance, how can I import this csv and get it properly displayed?
And this would be my final code to create a proper dataset with minimal use of space and an intelligent structure for further processing (code is tested on University Edition):
data wide;
infile "/folders/myfolders/API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv" dlm=',' dsd truncover firstobs=6;
input country_name :$40. country_code :$3. dummy :$1. dummy (_1960-_2019) (:20.);
drop dummy;
run;
proc sort data=wide;
by country_code;
run;
proc transpose data=wide out=long (rename=(col1=income) where=(income ne .));
by country_code;
var _:;
run;
data cntlin;
set wide (
keep=country_code country_name
rename=(country_code=start country_name=label)
);
retain fmtname "country" type "C";
run;
proc format cntlin=cntlin;
run;
data want;
set long;
year = input(substr(_name_,2),4.);
format country_code $country.;
drop _name_;
run;
Don't bother with proc import, write the data step yourself. You have country name, country code, two redundant columns you can drop anyway, and then numeric columns for 1960-2019. And the data starts in line 6, with line 5 holding the header, which might be hard for proc import to catch.
data want;
infile "......csv" dlm=',' dsd truncover firstobs=6;
input country_name :$40. country_code :$3. dummy :$1. dummy (_1960-_2019) (:20.);
run;
And this would be my final code to create a proper dataset with minimal use of space and an intelligent structure for further processing (code is tested on University Edition):
data wide;
infile "/folders/myfolders/API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv" dlm=',' dsd truncover firstobs=6;
input country_name :$40. country_code :$3. dummy :$1. dummy (_1960-_2019) (:20.);
drop dummy;
run;
proc sort data=wide;
by country_code;
run;
proc transpose data=wide out=long (rename=(col1=income) where=(income ne .));
by country_code;
var _:;
run;
data cntlin;
set wide (
keep=country_code country_name
rename=(country_code=start country_name=label)
);
retain fmtname "country" type "C";
run;
proc format cntlin=cntlin;
run;
data want;
set long;
year = input(substr(_name_,2),4.);
format country_code $country.;
drop _name_;
run;
Thank you kind Sir, I will have to work on your code then to fully understand it.
@nowak22 wrote:
Thank you kind Sir, I will have to work on your code then to fully understand it.
The first step reads the file into a dataset; to make it easier to code, I used a variable list (_1960-_2019), which creates all variables in the sequence automatically.
Next, I sorted this dataset to facilitate the use of BY in the transpose step.
The transpose converts the horizontal structure to a vertical structure, removes the country name, and removes all missing values by means of a WHERE= dataset option.
Then I extract the country code and country name and rename them, add the necessary variables for a "controlin" dataset for PROC FORMAT, and execute PROC FORMAT to create a format that maps from country code to country name.
In the final step, I ceate a numeric YEAR variable from the variable _NAME_, which was automatically created in the transpose step. In this step, I also use the newly created format.
If the resulting dataset is to be used later, you should add the format to a permanent format catalog that is in your FMTSEARCH system option.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.