Hey,
I was trying to import a CSV file with CPI data from World Bank, from here: https://data.worldbank.org/indicator/FP.CPI.TOTL.ZG
and importing a file named 'API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv'.
The problem I reckon is that there are few delimiters, including space, tab and comma.
Whatever I specify in a delimiter option I never get the expecting results. For instance:
proc import datafile="/home/sasuser.v94/pliki/API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv"
out=inflation_wb
dbms=dlm
replace;
delimiter=' ,"';
getnames=no;
run;
I will recieve an output with 74 columns and no data.
When I specify delimiter=' ,'; then I will get the ouput with all data packed in one column.
Anyone knows by any chance, how can I import this csv and get it properly displayed?
And this would be my final code to create a proper dataset with minimal use of space and an intelligent structure for further processing (code is tested on University Edition):
data wide;
infile "/folders/myfolders/API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv" dlm=',' dsd truncover firstobs=6;
input country_name :$40. country_code :$3. dummy :$1. dummy (_1960-_2019) (:20.);
drop dummy;
run;
proc sort data=wide;
by country_code;
run;
proc transpose data=wide out=long (rename=(col1=income) where=(income ne .));
by country_code;
var _:;
run;
data cntlin;
set wide (
keep=country_code country_name
rename=(country_code=start country_name=label)
);
retain fmtname "country" type "C";
run;
proc format cntlin=cntlin;
run;
data want;
set long;
year = input(substr(_name_,2),4.);
format country_code $country.;
drop _name_;
run;
Don't bother with proc import, write the data step yourself. You have country name, country code, two redundant columns you can drop anyway, and then numeric columns for 1960-2019. And the data starts in line 6, with line 5 holding the header, which might be hard for proc import to catch.
data want;
infile "......csv" dlm=',' dsd truncover firstobs=6;
input country_name :$40. country_code :$3. dummy :$1. dummy (_1960-_2019) (:20.);
run;
And this would be my final code to create a proper dataset with minimal use of space and an intelligent structure for further processing (code is tested on University Edition):
data wide;
infile "/folders/myfolders/API_FP.CPI.TOTL.ZG_DS2_en_csv_v2_988671.csv" dlm=',' dsd truncover firstobs=6;
input country_name :$40. country_code :$3. dummy :$1. dummy (_1960-_2019) (:20.);
drop dummy;
run;
proc sort data=wide;
by country_code;
run;
proc transpose data=wide out=long (rename=(col1=income) where=(income ne .));
by country_code;
var _:;
run;
data cntlin;
set wide (
keep=country_code country_name
rename=(country_code=start country_name=label)
);
retain fmtname "country" type "C";
run;
proc format cntlin=cntlin;
run;
data want;
set long;
year = input(substr(_name_,2),4.);
format country_code $country.;
drop _name_;
run;
Thank you kind Sir, I will have to work on your code then to fully understand it.
@nowak22 wrote:
Thank you kind Sir, I will have to work on your code then to fully understand it.
The first step reads the file into a dataset; to make it easier to code, I used a variable list (_1960-_2019), which creates all variables in the sequence automatically.
Next, I sorted this dataset to facilitate the use of BY in the transpose step.
The transpose converts the horizontal structure to a vertical structure, removes the country name, and removes all missing values by means of a WHERE= dataset option.
Then I extract the country code and country name and rename them, add the necessary variables for a "controlin" dataset for PROC FORMAT, and execute PROC FORMAT to create a format that maps from country code to country name.
In the final step, I ceate a numeric YEAR variable from the variable _NAME_, which was automatically created in the transpose step. In this step, I also use the newly created format.
If the resulting dataset is to be used later, you should add the format to a permanent format catalog that is in your FMTSEARCH system option.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.