DATA Step, Macro, Functions and more

Importing dataset with dates as column headers

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 13
Accepted Solution

Importing dataset with dates as column headers

Morning - 

 

I have a .csv data set (not created by me) that contains city, state and then population by year (i.e., 1/1/2015, 1/1/2016, etc...).  When I import the data into sas using the infile statement, it coverts the headers to _1_1_2015, _1_1_2016.  I would prefer my column headers to be something like Y2015, Y2016, etc.  As a way to get around this, I condsidered NOT importing the column headers and just later renaming them, but I'm wondering there's a better way to go about this.

 

Here's an example of my code: 

proc import
datafile="C:\example.csv"
dbms=csv
out=work.newfile
replace;
getnames=no;
datarows=2;
run;

 

Next, I thought I could create a new table using a proc sql statement and renaming each of the date columns.  I imagine there has to be a better way to go about this.  Any advice?


Accepted Solutions
Solution
‎02-19-2017 04:10 PM
PROC Star
Posts: 7,486

Re: Importing dataset with dates as column headers

I'll answer your questions in reverse.

 

first(name) eq '_'  <<<---- is this calling on name created in line 2? No. I describe NAME, below.

 

from dictionary.columns <<<<--------

where libname='WORK' and
memname='NEWFILE' and <<<<-----  what is memname?

 

dictionary.columns is a table that contains all of the column headers (i.e., variable names) for every sas dataset that is identified with a libname. One of its fields is called NAME, which holds the variable names.

MEMNAME is the field that contains the dataset names and libname is the field that contains the library names. Their values, in dictionary.columns, are ALL UPPERCASE. So, in this case, the code is getting the NAMEs of all of the variables in work.newfile

 

So, going back to first(name) eq '_', FIRST is a function that captures the first character of a variable's value. So, in this case, the where statement is selecting only those variables from work.newname whose variable name starts with a '_' character (namely, your date columns).

 

select catx('=y',name,  <<<---------
year(input(translate(substr(name,2),'/','_'),anydtdte.)))
into :vnames separated by ' '

 

Catx is a function that concatenates strings together, separating them by the first string listed. The into :vnames separated by ' ' creates a macro variable. So, the full statement is taking each name that begins with an underscore (i.e., _2000_01_01, _01_01_2001, _2002_01_01, _2003_01_01, _2004_01_01, _2005_01_01, _2006_01_01 and_2007_01_01), and converting them into one long string, namely _2000_01_01=y2000 _2001_01_01=y2001 _2002_01_01=y2002 _2003_01_01=y2003 _2004_01_01=y2004 _2005_01_01=y2005 _2006_01_01=y2006 _2007_01_01=y2007

 

The translate function is used to convert the underscores into forward slashes so that the anydtdte. informat will read the values as dates. The statement starts with the year function which extracts the years from the values.

 

The macro variable that is created simply contains the above string, which just happens to be in the exact same format that the recode option calls for.

 

HTH,

Art, CEO, AnalystFinder.com

 

View solution in original post


All Replies
Super User
Posts: 19,837

Re: Importing dataset with dates as column headers

Transpose to a long format and then use scan to extract the year. 

If you really want a wide format, transpose back to a long format using proc transpose and ID variable as the Year variable. 

Trusted Advisor
Posts: 1,577

Re: Importing dataset with dates as column headers

[ Edited ]

How many years' population do you have in the csv file ? 

Can you post a sample of your csv file - just few rows to be used a test file ?

 

You wrote: When I import the data into sas using the infile statement, it coverts the headers to _1_1_2015

I believe the header is created by PROC IMPORT and not throght data step with INFILE statement.

Using infile you have better control on variable names (headers) and data format.

 

You can adapt next code to your needs:

/* filename pop '...path and name.csv'; */
data want;
    infile datalines /* pop */ dlm=',' truncover firstobs=2;
    input city $ state $ pop2015 - pop2017;
    format pop: comma15.;
datalines;
CITY, state,01/01/2015,01/01/2016,01/01/2017
city name a, st1, 200000, 200200, 202,500
mini town, st2, 5000, 4899, 5005
capital, vw, 10200500, 10220000, 101900800
;
run;

 

Super User
Posts: 7,832

Re: Importing dataset with dates as column headers

proc import writes the data step it creates to the log. Copy that from there and replace the column names.

If you need that process automated, you can read the original column names from the first line and then create a data step dynamically with call execute.

Please post some example lines from the csv.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 13

Re: Importing dataset with dates as column headers

Thanks all -

 

Attached is a sample of my data.

Super User
Posts: 7,832

Re: Importing dataset with dates as column headers

THAT is not a csv.

Please post the top three lines of the original csv file as text.

An Excel file (aside from the fact that it could contain malware) is utterly useless for writing a data step to import a csv. That should be obvious.

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Occasional Contributor
Posts: 13

Re: Importing dataset with dates as column headers

Posted in reply to KurtBremser

I saved the sample file as an excel file in error.  The original IS a csv.  Attached the first 3 lines of the original csv saved as text.

Trusted Advisor
Posts: 1,577

Re: Importing dataset with dates as column headers

Attached is the code to be done in one step. Just adapt macro variables on top to desired years.

You can also change the prefix from POP to Y if you like.

filename pop '/folders/myshortcuts/My_Folders/flat/samplefile-pop.csv'; 
%let from_year = 2000;
%let upto_year = 2007;
data want;
    infile pop truncover firstobs=2;
    input a_line $120.;
    
    city = scan(a_line,1,',');
    state = scan(a_line,2,',');
    
    pos = index(a_line,strip(state)); 
 put city= state=  pos=;
    pos = pos + lengthn(state) +2; 
    popx = compress(substr(a_line,pos),','); put pos= popx=;
    
    array pop pop&from_year - pop&upto_year;
    
    do i=1 to 8;
       vx = scan(popx,i,'"'); put i= vx=;
       pop(i) = input(vx,comma12.);
    end;
    keep city state pop&from_year - pop&upto_year;
run;
PROC Star
Posts: 7,486

Re: Importing dataset with dates as column headers

I like your idea of using SQL to rename the variables. Since you said it was a comma delimited file, I save your example file as such and accomplished the task as follows:

 

proc import
    datafile="C:\art\samplefile-pop.csv"
    dbms=csv
    out=work.newfile
    replace;
  getnames=yes;
  datarow=2;
run;

proc sql noprint;
  select catx('=y',name,
    year(input(translate(substr(name,2),'/','_'),anydtdte.)))
     into :vnames separated by ' '
       from dictionary.columns
         where libname='WORK' and
               memname='NEWFILE' and
               first(name) eq '_'
  ;
quit;

data newfile;
  set newfile (rename=(&vnames.));
run;

HTH,

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 13

Re: Importing dataset with dates as column headers

[ Edited ]

Art297,  thank you!

 

I'm a student and just learning SAS, so can you help me understand  the proc sql statement you provided? It appears you're creating a macro (I'm just learnning macros and find them somewhat confusing).  Can you explain what the sections with arrows are doing?

 

proc sql noprint;
select catx('=y',name,  <<<---------
year(input(translate(substr(name,2),'/','_'),anydtdte.)))
into :vnames separated by ' '
from dictionary.columns <<<<--------
where libname='WORK' and
memname='NEWFILE' and <<<<-----  what is memname?
first(name) eq '_'  <<<---- is this calling on name created in line 2?

 

 

Solution
‎02-19-2017 04:10 PM
PROC Star
Posts: 7,486

Re: Importing dataset with dates as column headers

I'll answer your questions in reverse.

 

first(name) eq '_'  <<<---- is this calling on name created in line 2? No. I describe NAME, below.

 

from dictionary.columns <<<<--------

where libname='WORK' and
memname='NEWFILE' and <<<<-----  what is memname?

 

dictionary.columns is a table that contains all of the column headers (i.e., variable names) for every sas dataset that is identified with a libname. One of its fields is called NAME, which holds the variable names.

MEMNAME is the field that contains the dataset names and libname is the field that contains the library names. Their values, in dictionary.columns, are ALL UPPERCASE. So, in this case, the code is getting the NAMEs of all of the variables in work.newfile

 

So, going back to first(name) eq '_', FIRST is a function that captures the first character of a variable's value. So, in this case, the where statement is selecting only those variables from work.newname whose variable name starts with a '_' character (namely, your date columns).

 

select catx('=y',name,  <<<---------
year(input(translate(substr(name,2),'/','_'),anydtdte.)))
into :vnames separated by ' '

 

Catx is a function that concatenates strings together, separating them by the first string listed. The into :vnames separated by ' ' creates a macro variable. So, the full statement is taking each name that begins with an underscore (i.e., _2000_01_01, _01_01_2001, _2002_01_01, _2003_01_01, _2004_01_01, _2005_01_01, _2006_01_01 and_2007_01_01), and converting them into one long string, namely _2000_01_01=y2000 _2001_01_01=y2001 _2002_01_01=y2002 _2003_01_01=y2003 _2004_01_01=y2004 _2005_01_01=y2005 _2006_01_01=y2006 _2007_01_01=y2007

 

The translate function is used to convert the underscores into forward slashes so that the anydtdte. informat will read the values as dates. The statement starts with the year function which extracts the years from the values.

 

The macro variable that is created simply contains the above string, which just happens to be in the exact same format that the recode option calls for.

 

HTH,

Art, CEO, AnalystFinder.com

 

Occasional Contributor
Posts: 13

Re: Importing dataset with dates as column headers

Thank you for taking the time to answer my questions in detail.
☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 11 replies
  • 276 views
  • 5 likes
  • 5 in conversation