Morning -
I have a .csv data set (not created by me) that contains city, state and then population by year (i.e., 1/1/2015, 1/1/2016, etc...). When I import the data into sas using the infile statement, it coverts the headers to _1_1_2015, _1_1_2016. I would prefer my column headers to be something like Y2015, Y2016, etc. As a way to get around this, I condsidered NOT importing the column headers and just later renaming them, but I'm wondering there's a better way to go about this.
Here's an example of my code:
proc import
datafile="C:\example.csv"
dbms=csv
out=work.newfile
replace;
getnames=no;
datarows=2;
run;
Next, I thought I could create a new table using a proc sql statement and renaming each of the date columns. I imagine there has to be a better way to go about this. Any advice?
I'll answer your questions in reverse.
first(name) eq '_' <<<---- is this calling on name created in line 2? No. I describe NAME, below.
from dictionary.columns <<<<--------
where libname='WORK' and
memname='NEWFILE' and <<<<----- what is memname?
dictionary.columns is a table that contains all of the column headers (i.e., variable names) for every sas dataset that is identified with a libname. One of its fields is called NAME, which holds the variable names.
MEMNAME is the field that contains the dataset names and libname is the field that contains the library names. Their values, in dictionary.columns, are ALL UPPERCASE. So, in this case, the code is getting the NAMEs of all of the variables in work.newfile
So, going back to first(name) eq '_', FIRST is a function that captures the first character of a variable's value. So, in this case, the where statement is selecting only those variables from work.newname whose variable name starts with a '_' character (namely, your date columns).
select catx('=y',name, <<<---------
year(input(translate(substr(name,2),'/','_'),anydt
into :vnames separated by ' '
Catx is a function that concatenates strings together, separating them by the first string listed. The into :vnames separated by ' ' creates a macro variable. So, the full statement is taking each name that begins with an underscore (i.e., _2000_01_01, _01_01_2001, _2002_01_01, _2003_01_01, _2004_01_01, _2005_01_01, _2006_01_01 and_2007_01_01), and converting them into one long string, namely _2000_01_01=y2000 _2001_01_01=y2001 _2002_01_01=y2002 _2003_01_01=y2003 _2004_01_01=y2004 _2005_01_01=y2005 _2006_01_01=y2006 _2007_01_01=y2007
The translate function is used to convert the underscores into forward slashes so that the anydtdte. informat will read the values as dates. The statement starts with the year function which extracts the years from the values.
The macro variable that is created simply contains the above string, which just happens to be in the exact same format that the recode option calls for.
HTH,
Art, CEO, AnalystFinder.com
Transpose to a long format and then use scan to extract the year.
If you really want a wide format, transpose back to a long format using proc transpose and ID variable as the Year variable.
How many years' population do you have in the csv file ?
Can you post a sample of your csv file - just few rows to be used a test file ?
You wrote: When I import the data into sas using the infile statement, it coverts the headers to _1_1_2015
I believe the header is created by PROC IMPORT and not throght data step with INFILE statement.
Using infile you have better control on variable names (headers) and data format.
You can adapt next code to your needs:
/* filename pop '...path and name.csv'; */
data want;
infile datalines /* pop */ dlm=',' truncover firstobs=2;
input city $ state $ pop2015 - pop2017;
format pop: comma15.;
datalines;
CITY, state,01/01/2015,01/01/2016,01/01/2017
city name a, st1, 200000, 200200, 202,500
mini town, st2, 5000, 4899, 5005
capital, vw, 10200500, 10220000, 101900800
;
run;
proc import writes the data step it creates to the log. Copy that from there and replace the column names.
If you need that process automated, you can read the original column names from the first line and then create a data step dynamically with call execute.
Please post some example lines from the csv.
Thanks all -
Attached is a sample of my data.
THAT is not a csv.
Please post the top three lines of the original csv file as text.
An Excel file (aside from the fact that it could contain malware) is utterly useless for writing a data step to import a csv. That should be obvious.
I saved the sample file as an excel file in error. The original IS a csv. Attached the first 3 lines of the original csv saved as text.
Attached is the code to be done in one step. Just adapt macro variables on top to desired years.
You can also change the prefix from POP to Y if you like.
filename pop '/folders/myshortcuts/My_Folders/flat/samplefile-pop.csv';
%let from_year = 2000;
%let upto_year = 2007;
data want;
infile pop truncover firstobs=2;
input a_line $120.;
city = scan(a_line,1,',');
state = scan(a_line,2,',');
pos = index(a_line,strip(state));
put city= state= pos=;
pos = pos + lengthn(state) +2;
popx = compress(substr(a_line,pos),','); put pos= popx=;
array pop pop&from_year - pop&upto_year;
do i=1 to 8;
vx = scan(popx,i,'"'); put i= vx=;
pop(i) = input(vx,comma12.);
end;
keep city state pop&from_year - pop&upto_year;
run;
I like your idea of using SQL to rename the variables. Since you said it was a comma delimited file, I save your example file as such and accomplished the task as follows:
proc import datafile="C:\art\samplefile-pop.csv" dbms=csv out=work.newfile replace; getnames=yes; datarow=2; run; proc sql noprint; select catx('=y',name, year(input(translate(substr(name,2),'/','_'),anydtdte.))) into :vnames separated by ' ' from dictionary.columns where libname='WORK' and memname='NEWFILE' and first(name) eq '_' ; quit; data newfile; set newfile (rename=(&vnames.)); run;
HTH,
Art, CEO, AnalystFinder.com
Art297, thank you!
I'm a student and just learning SAS, so can you help me understand the proc sql statement you provided? It appears you're creating a macro (I'm just learnning macros and find them somewhat confusing). Can you explain what the sections with arrows are doing?
proc sql noprint;
select catx('=y',name, <<<---------
year(input(translate(substr(name,2),'/','_'),anydtdte.)))
into :vnames separated by ' '
from dictionary.columns <<<<--------
where libname='WORK' and
memname='NEWFILE' and <<<<----- what is memname?
first(name) eq '_' <<<---- is this calling on name created in line 2?
I'll answer your questions in reverse.
first(name) eq '_' <<<---- is this calling on name created in line 2? No. I describe NAME, below.
from dictionary.columns <<<<--------
where libname='WORK' and
memname='NEWFILE' and <<<<----- what is memname?
dictionary.columns is a table that contains all of the column headers (i.e., variable names) for every sas dataset that is identified with a libname. One of its fields is called NAME, which holds the variable names.
MEMNAME is the field that contains the dataset names and libname is the field that contains the library names. Their values, in dictionary.columns, are ALL UPPERCASE. So, in this case, the code is getting the NAMEs of all of the variables in work.newfile
So, going back to first(name) eq '_', FIRST is a function that captures the first character of a variable's value. So, in this case, the where statement is selecting only those variables from work.newname whose variable name starts with a '_' character (namely, your date columns).
select catx('=y',name, <<<---------
year(input(translate(substr(name,2),'/','_'),anydt
into :vnames separated by ' '
Catx is a function that concatenates strings together, separating them by the first string listed. The into :vnames separated by ' ' creates a macro variable. So, the full statement is taking each name that begins with an underscore (i.e., _2000_01_01, _01_01_2001, _2002_01_01, _2003_01_01, _2004_01_01, _2005_01_01, _2006_01_01 and_2007_01_01), and converting them into one long string, namely _2000_01_01=y2000 _2001_01_01=y2001 _2002_01_01=y2002 _2003_01_01=y2003 _2004_01_01=y2004 _2005_01_01=y2005 _2006_01_01=y2006 _2007_01_01=y2007
The translate function is used to convert the underscores into forward slashes so that the anydtdte. informat will read the values as dates. The statement starts with the year function which extracts the years from the values.
The macro variable that is created simply contains the above string, which just happens to be in the exact same format that the recode option calls for.
HTH,
Art, CEO, AnalystFinder.com
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.