This is a follow up to an problem I posted yesterday: https://communities.sas.com/t5/Base-SAS-Programming/Importing-dataset-with-dates-as-column-headers/m...
I have a .csv file with State, County, and population from 2000-2007. See attached sample. I was able to successfully load the file using proc import, however, I did not notice the population data loaded as character values. I started over and decided to import the file by copying and modifying code from the log; however, that's not working either.
data WORK.POPULATION2001_2007 ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile 'C:\population_by_county_2001_2007.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat State $25. ;
informat County $30. ;
informat _1_1_2001 best12. ;
informat _1_1_2002 best12. ;
informat _1_1_2003 best12. ;
informat _1_1_2004 best12. ;
informat _1_1_2005 best12. ;
informat _1_1_2006 best12. ;
informat _1_1_2007 best12. ;
format State $25. ;
format County $30. ;
format _1_1_2001 comma12. ;
format _1_1_2002 comma12. ;
format _1_1_2003 comma12. ;
format _1_1_2004 comma12. ;
format _1_1_2005 comma12. ;
format _1_1_2006 comma12. ;
format _1_1_2007 comma12. ;
input
State $
County $
_1_1_2001 /*NOTE: I'll change the column header's later using a macro that was suggested in yesterday's post*/
_1_1_2002
_1_1_2003
_1_1_2004
_1_1_2005
_1_1_2006
_1_1_2007
;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
run;
Here's a sample of the error message I'm getting:
NOTE: Invalid data for _1_1_2007 in line 20 95-100.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+---
20 Alabama,Coosa County,"11,746","11,539","11,423","11,156","10,964","10,857","10,783" 102
State=Alabama County=Coosa County _7_1_2001=. _7_1_2002=. _7_1_2003=. _7_1_2004=.
_7_1_2005=. _7_1_2006=. _7_1_2007=. _ERROR_=1 _N_=19
For one thing, it appears my data still has quotes around it. I thought having DSD in the infile statement was supposed to help with issues such as these. Any thoughts on what I'm doing wrong?
Also, is there a better way to change data from character to numeric?
I am a student who is new to SAS, so while there might be something that's glaring obvious to most, it may not be to me -- at least not yet 😉
You should use the comma informat for your numeric variables.
What you see in the line here:
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--- 20 Alabama,Coosa County,"11,746","11,539","11,423","11,156","10,964","10,857","10,783"
is not your SAS data, but the line from the infile "as is", so you will see the double quotes as they are delivered.
And the line for Broome county in your example text file misses the closing double quote for _01_01_2007.
This program:
data test;
infile "$HOME/sascommunity/population_by_county_2001_2007.txt" dlm=',' dsd truncover firstobs=2;
informat
State $25.
County $30.
_1_1_2001
_1_1_2002
_1_1_2003
_1_1_2004
_1_1_2005
_1_1_2006
_1_1_2007
comma12.
;
input
State
County
_1_1_2001
_1_1_2002
_1_1_2003
_1_1_2004
_1_1_2005
_1_1_2006
_1_1_2007
;
run;
proc print data=test;
run;
delivered this result:
_1_1_ _1_1_ _1_1_ _1_1_ _1_1_ _1_1_ _1_1_ Obs State County 2001 2002 2003 2004 2005 2006 2007 1 New York Albany 295378 296644 298491 298819 298605 299033 298246 2 New York Broome 199999 199714 198364 197463 196103 195700 . 3 New York Chemung 90673 90409 89816 89366 88351 88164 88058
In your previous, yesterday, post you have attached the sample_population.csv file.
In my response you got the code to read the file as is, option to rename the population variables and
convert them to numeric. Have you tried the code ? It is attached again here with slight change to enable easy
define preffered prefix for population variables:
filename pop '/folders/myshortcuts/My_Folders/flat/samplefile-pop.csv'; /* adapt path and name */
%let from_year = 2000;
%let upto_year = 2007;
%let prefix = pop; /* change prefix to your preffered */
data want;
infile pop truncover firstobs=2;
input a_line $120.;
city = scan(a_line,1,',');
state = scan(a_line,2,',');
pos = index(a_line,strip(state));
pos = pos + lengthn(state) +2;
popx = compress(substr(a_line,pos),',');
array pop &prefix.&from_year - &prefix.&upto_year;
do i=1 to 8;
vx = scan(popx,i,'"'); put i= vx=;
pop(i) = input(vx,comma12.);
end;
keep city state &prefix.&from_year - &prefix.&upto_year;
run;
Shmuel, thank you. I had not tried the code. I only started to learn about marcro's last week, so I'm not familiar with a_line, trucover, and other items in the code (i.e., vx). If you have the time, and don't mind, would you kindly explain what they accomplish?
You will find explanation to my code in the attached documnent.
In any case you need more informatin on a SAS term like statement, function etc. you
can search by google for sas support <term> .
For INFILE statement you may look at:
https://support.sas.com/documentation/cdl/en/lestmtsref/63323/HTML/default/viewer.htm#n1rill4udj0tfu...
Shmuel,
Thank you SO much! Your explanation was better than anything i've come across online (or in class!). I feel more comfortable trying your approach now.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.