I have this file attached that I want to import in SAS but the female smokers and the male smokers columns only show me the first number for each row. Could you help.
This is my code:
PROC IMPORT
DATAFILE="C:\Users\admin\Desktop\Logiciels statistiques\Devoir individuel 3\OWID_COVID_DATA_2020.csv"
OUT=mylib_d3.owid_covid_data_2020
DBMS=csv
REPLACE;
FORMAT
iso_code $CHAR8.
date YYMMDD10.
new_cases BEST2.
new_deaths BEST3.
new_tests BEST12.
total_tests BEST12.
population BEST10.
median_age BEST6.
aged_65_older BEST6.
aged_70_older BEST6.
cardiovasc_death_rate BEST6.
diabetes_prevalence BEST6.
female_smokers COMMA32.
male_smokers COMMA32.;
GETNAMES=yes;
RUN;
"Use the data step Luke"
data want;
infile "C:\Users\bart\Desktop\OWID_COVID_DATA_2020.csv" lrecl=1024 dsd dlm="," firstobs=2 missover;
input
iso_code : $ 8.
date : YYMMDD10.
(new_cases
new_deaths
new_tests
total_tests
population
median_age
aged_65_older
aged_70_older
cardiovasc_death_rate
diabetes_prevalence
female_smokers
male_smokers) (: BEST32.)
;
run;
Bart
Hello @Feksan,
Add the statement
guessingrows=max;
before the RUN statement to improve the chance that everything is imported correctly. Note in PROC CONTENTS output of dataset mylib_d3.owid_covid_data_2020 how this changes the type, length, format and informat of several variables.
"Use the data step Luke"
data want;
infile "C:\Users\bart\Desktop\OWID_COVID_DATA_2020.csv" lrecl=1024 dsd dlm="," firstobs=2 missover;
input
iso_code : $ 8.
date : YYMMDD10.
(new_cases
new_deaths
new_tests
total_tests
population
median_age
aged_65_older
aged_70_older
cardiovasc_death_rate
diabetes_prevalence
female_smokers
male_smokers) (: BEST32.)
;
run;
Bart
PROC IMPORT is going to GUESS how to define the variables and what informat to use to read them and what format to attach for displaying them.
It is generally easier, faster and more accurate to write your own data step to read a delimited text file.
data want;
infile "C:\downloads\&fname" dsd truncover firstobs=2;
input
iso_code :$3.
date :yymmdd.
new_cases
new_deaths
new_tests
total_tests
population
median_age
aged_65_older
aged_70_older
cardiovasc_death_rate
diabetes_prevalence
female_smokers
male_smokers
;
format date yymmdd10.;
run;
What happened is that Proc Import by default only examines about 20 rows of data to set variable properties.
You variables in question, female smokers and male smokers are not populated for the first rows so SAS assigned them a $1, one character, to read.
You may also find that report style data sets, such as result from reading this data, are a bit hard to work with.
Join us for SAS Innovate 2025, our biggest and most exciting global event of the year, in Orlando, FL, from May 6-9. Sign up by March 14 for just $795.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.