Solved: Importing an excel file from world bank with all data in a single colu...

nowak22 · Posted 05-03-2020 09:00 AM

Hey,

I was trying to import a CSV file with CPI data from World Bank, from here: https://data.worldbank.org/indicator/FP.CPI.TOTL.ZG

and importing a file named 'API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv'.

The problem I reckon is that there are few delimiters, including space, tab and comma.

Whatever I specify in a delimiter option I never get the expecting results. For instance:

proc import datafile="/home/sasuser.v94/pliki/API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv"
        out=inflation_wb
        dbms=dlm
        replace;
        delimiter=' ,"';
        getnames=no;
run;

I will recieve an output with 74 columns and no data.

When I specify delimiter=' ,'; then I will get the ouput with all data packed in one column.

Anyone knows by any chance, how can I import this csv and get it properly displayed?

Kurt_Bremser · Posted 05-03-2020 10:11 AM

And this would be my final code to create a proper dataset with minimal use of space and an intelligent structure for further processing (code is tested on University Edition):

data wide;
infile "/folders/myfolders/API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv" dlm=',' dsd truncover firstobs=6;
input country_name :$40. country_code :$3. dummy :$1. dummy (_1960-_2019) (:20.);
drop dummy;
run;

proc sort data=wide;
by country_code;
run;

proc transpose data=wide out=long (rename=(col1=income) where=(income ne .));
by country_code;
var _:;
run;

data cntlin;
set wide (
  keep=country_code country_name
  rename=(country_code=start country_name=label)
);
retain fmtname "country" type "C";
run;

proc format cntlin=cntlin;
run;

data want;
set long;
year = input(substr(_name_,2),4.);
format country_code $country.;
drop _name_;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

View solution in original post

Kurt_Bremser · Posted 05-03-2020 09:32 AM

Don't bother with proc import, write the data step yourself. You have country name, country code, two redundant columns you can drop anyway, and then numeric columns for 1960-2019. And the data starts in line 6, with line 5 holding the header, which might be hard for proc import to catch.

data want;
infile "......csv" dlm=',' dsd truncover firstobs=6;
input country_name :$40. country_code :$3. dummy :$1. dummy (_1960-_2019) (:20.);
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Kurt_Bremser · Posted 05-03-2020 10:11 AM

And this would be my final code to create a proper dataset with minimal use of space and an intelligent structure for further processing (code is tested on University Edition):

data wide;
infile "/folders/myfolders/API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv" dlm=',' dsd truncover firstobs=6;
input country_name :$40. country_code :$3. dummy :$1. dummy (_1960-_2019) (:20.);
drop dummy;
run;

proc sort data=wide;
by country_code;
run;

proc transpose data=wide out=long (rename=(col1=income) where=(income ne .));
by country_code;
var _:;
run;

data cntlin;
set wide (
  keep=country_code country_name
  rename=(country_code=start country_name=label)
);
retain fmtname "country" type "C";
run;

proc format cntlin=cntlin;
run;

data want;
set long;
year = input(substr(_name_,2),4.);
format country_code $country.;
drop _name_;
run;

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

nowak22 · Posted 05-03-2020 10:38 AM

Thank you kind Sir, I will have to work on your code then to fully understand it.

Kurt_Bremser · Posted 05-04-2020 11:01 AM

@nowak22 wrote:

Thank you kind Sir, I will have to work on your code then to fully understand it.

The first step reads the file into a dataset; to make it easier to code, I used a variable list (_1960-_2019), which creates all variables in the sequence automatically.

Next, I sorted this dataset to facilitate the use of BY in the transpose step.

The transpose converts the horizontal structure to a vertical structure, removes the country name, and removes all missing values by means of a WHERE= dataset option.

Then I extract the country code and country name and rename them, add the necessary variables for a "controlin" dataset for PROC FORMAT, and execute PROC FORMAT to create a format that maps from country code to country name.

In the final step, I ceate a numeric YEAR variable from the variable _NAME_, which was automatically created in the transpose step. In this step, I also use the newly created format.

If the resulting dataset is to be used later, you should add the format to a permanent format catalog that is in your FMTSEARCH system option.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

Importing an excel file from world bank with all data in a single column.

Re: Importing an excel file from world bank with all data in a single column.

Re: Importing an excel file from world bank with all data in a single column.

Re: Importing an excel file from world bank with all data in a single column.

Re: Importing an excel file from world bank with all data in a single column.

Re: Importing an excel file from world bank with all data in a single column.

Importing an excel file from world bank with all data in a single column.

Re: Importing an excel file from world bank with all data in a single column.

Re: Importing an excel file from world bank with all data in a single column.

Re: Importing an excel file from world bank with all data in a single column.

Re: Importing an excel file from world bank with all data in a single column.

Re: Importing an excel file from world bank with all data in a single column.

Register Today!

SAS Training: Just a Click Away