BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
nowak22
Fluorite | Level 6

Hey,

I was trying to import a CSV file with CPI data from World Bank, from here: https://data.worldbank.org/indicator/FP.CPI.TOTL.ZG

and importing a file named 'API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv'.

 

The problem I reckon is that there are few delimiters, including space, tab and comma.

Whatever I specify in a delimiter option I never get the expecting results. For instance: 

proc import datafile="/home/sasuser.v94/pliki/API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv"
        out=inflation_wb
        dbms=dlm
        replace;
        delimiter=' ,"';
        getnames=no;
run;

I will recieve an output with 74 columns and no data.

When I specify delimiter=' ,'; then I will get the ouput with all data packed in one column.

 

Anyone knows by any chance, how can I import this csv and get it properly displayed? 

1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

And this would be my final code to create a proper dataset with minimal use of space and an intelligent structure for further processing (code is tested on University Edition):

data wide;
infile "/folders/myfolders/API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv" dlm=',' dsd truncover firstobs=6;
input country_name :$40. country_code :$3. dummy :$1. dummy (_1960-_2019) (:20.);
drop dummy;
run;

proc sort data=wide;
by country_code;
run;

proc transpose data=wide out=long (rename=(col1=income) where=(income ne .));
by country_code;
var _:;
run;

data cntlin;
set wide (
  keep=country_code country_name
  rename=(country_code=start country_name=label)
);
retain fmtname "country" type "C";
run;

proc format cntlin=cntlin;
run;

data want;
set long;
year = input(substr(_name_,2),4.);
format country_code $country.;
drop _name_;
run;

 

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

Don't bother with proc import, write the data step yourself. You have country name, country code, two redundant columns you can drop anyway, and then numeric columns for 1960-2019. And the data starts in line 6, with line 5 holding the header, which might be hard for proc import to catch.

data want;
infile "......csv" dlm=',' dsd truncover firstobs=6;
input country_name :$40. country_code :$3. dummy :$1. dummy (_1960-_2019) (:20.);
run;
Kurt_Bremser
Super User

And this would be my final code to create a proper dataset with minimal use of space and an intelligent structure for further processing (code is tested on University Edition):

data wide;
infile "/folders/myfolders/API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv" dlm=',' dsd truncover firstobs=6;
input country_name :$40. country_code :$3. dummy :$1. dummy (_1960-_2019) (:20.);
drop dummy;
run;

proc sort data=wide;
by country_code;
run;

proc transpose data=wide out=long (rename=(col1=income) where=(income ne .));
by country_code;
var _:;
run;

data cntlin;
set wide (
  keep=country_code country_name
  rename=(country_code=start country_name=label)
);
retain fmtname "country" type "C";
run;

proc format cntlin=cntlin;
run;

data want;
set long;
year = input(substr(_name_,2),4.);
format country_code $country.;
drop _name_;
run;

 

nowak22
Fluorite | Level 6

Thank you kind Sir, I will have to work on your code then to fully understand it.

Kurt_Bremser
Super User

@nowak22 wrote:

Thank you kind Sir, I will have to work on your code then to fully understand it.


The first step reads the file into a dataset; to make it easier to code, I used a variable list (_1960-_2019), which creates all variables in the sequence automatically.

Next, I sorted this dataset to facilitate the use of BY in the transpose step.

The transpose converts the horizontal structure to a vertical structure, removes the country name, and removes all missing values by means of a WHERE= dataset option.

Then I extract the country code and country name and rename them, add the necessary variables for a "controlin" dataset for PROC FORMAT, and execute PROC FORMAT to create a format that maps from country code to country name.

In the final step, I ceate a numeric YEAR variable from the variable _NAME_, which was automatically created in the transpose step. In this step, I also use the newly created format.

 

If the resulting dataset is to be used later, you should add the format to a permanent format catalog that is in your FMTSEARCH system option.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 925 views
  • 4 likes
  • 2 in conversation