Solved: Re: Importing dataset with dates as column headers

KDS_1113 · Posted 02-19-2017 12:20 PM

Morning -

I have a .csv data set (not created by me) that contains city, state and then population by year (i.e., 1/1/2015, 1/1/2016, etc...). When I import the data into sas using the infile statement, it coverts the headers to _1_1_2015, _1_1_2016. I would prefer my column headers to be something like Y2015, Y2016, etc. As a way to get around this, I condsidered NOT importing the column headers and just later renaming them, but I'm wondering there's a better way to go about this.

Here's an example of my code:

proc import
datafile="C:\example.csv"
dbms=csv
out=work.newfile
replace;
getnames=no;
datarows=2;
run;

Next, I thought I could create a new table using a proc sql statement and renaming each of the date columns. I imagine there has to be a better way to go about this. Any advice?

art297 · Posted 02-19-2017 02:54 PM

I'll answer your questions in reverse.

first(name) eq '_' <<<---- is this calling on name created in line 2? No. I describe NAME, below.

from dictionary.columns <<<<--------

where libname='WORK' and
memname='NEWFILE' and <<<<----- what is memname?

dictionary.columns is a table that contains all of the column headers (i.e., variable names) for every sas dataset that is identified with a libname. One of its fields is called NAME, which holds the variable names.

MEMNAME is the field that contains the dataset names and libname is the field that contains the library names. Their values, in dictionary.columns, are ALL UPPERCASE. So, in this case, the code is getting the NAMEs of all of the variables in work.newfile

So, going back to first(name) eq '_', FIRST is a function that captures the first character of a variable's value. So, in this case, the where statement is selecting only those variables from work.newname whose variable name starts with a '_' character (namely, your date columns).

select catx('=y',name, <<<---------
year(input(translate(substr(name,2),'/','_'),anydtdte.)))
into :vnames separated by ' '

Catx is a function that concatenates strings together, separating them by the first string listed. The into :vnames separated by ' ' creates a macro variable. So, the full statement is taking each name that begins with an underscore (i.e., _2000_01_01, _01_01_2001, _2002_01_01, _2003_01_01, _2004_01_01, _2005_01_01, _2006_01_01 and_2007_01_01), and converting them into one long string, namely _2000_01_01=y2000 _2001_01_01=y2001 _2002_01_01=y2002 _2003_01_01=y2003 _2004_01_01=y2004 _2005_01_01=y2005 _2006_01_01=y2006 _2007_01_01=y2007

The translate function is used to convert the underscores into forward slashes so that the anydtdte. informat will read the values as dates. The statement starts with the year function which extracts the years from the values.

The macro variable that is created simply contains the above string, which just happens to be in the exact same format that the recode option calls for.

HTH,

Art, CEO, AnalystFinder.com

View solution in original post

Reeza · Posted 02-19-2017 12:42 PM

Transpose to a long format and then use scan to extract the year.

If you really want a wide format, transpose back to a long format using proc transpose and ID variable as the Year variable.

Shmuel · Posted 02-19-2017 12:59 PM

How many years' population do you have in the csv file ?

Can you post a sample of your csv file - just few rows to be used a test file ?

You wrote: When I import the data into sas using the infile statement, it coverts the headers to _1_1_2015

I believe the header is created by PROC IMPORT and not throght data step with INFILE statement.

Using infile you have better control on variable names (headers) and data format.

You can adapt next code to your needs:

/* filename pop '...path and name.csv'; */
data want;
    infile datalines /* pop */ dlm=',' truncover firstobs=2;
    input city $ state $ pop2015 - pop2017;
    format pop: comma15.;
datalines;
CITY, state,01/01/2015,01/01/2016,01/01/2017
city name a, st1, 200000, 200200, 202,500
mini town, st2, 5000, 4899, 5005
capital, vw, 10200500, 10220000, 101900800
;
run;

Kurt_Bremser · Posted 02-19-2017 01:22 PM

proc import writes the data step it creates to the log. Copy that from there and replace the column names.

If you need that process automated, you can read the original column names from the first line and then create a data step dynamically with call execute.

Please post some example lines from the csv.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

KDS_1113 · Posted 02-19-2017 01:27 PM

Thanks all -

Attached is a sample of my data.

Kurt_Bremser · Posted 02-19-2017 01:40 PM

THAT is not a csv.

Please post the top three lines of the original csv file as text.

An Excel file (aside from the fact that it could contain malware) is utterly useless for writing a data step to import a csv. That should be obvious.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

KDS_1113 · Posted 02-19-2017 02:10 PM

I saved the sample file as an excel file in error. The original IS a csv. Attached the first 3 lines of the original csv saved as text.

Shmuel · Posted 02-19-2017 03:09 PM

Attached is the code to be done in one step. Just adapt macro variables on top to desired years.

You can also change the prefix from POP to Y if you like.

filename pop '/folders/myshortcuts/My_Folders/flat/samplefile-pop.csv'; 
%let from_year = 2000;
%let upto_year = 2007;
data want;
    infile pop truncover firstobs=2;
    input a_line $120.;
    
    city = scan(a_line,1,',');
    state = scan(a_line,2,',');
    
    pos = index(a_line,strip(state)); 
 put city= state=  pos=;
    pos = pos + lengthn(state) +2; 
    popx = compress(substr(a_line,pos),','); put pos= popx=;
    
    array pop pop&from_year - pop&upto_year;
    
    do i=1 to 8;
       vx = scan(popx,i,'"'); put i= vx=;
       pop(i) = input(vx,comma12.);
    end;
    keep city state pop&from_year - pop&upto_year;
run;

art297 · Posted 02-19-2017 01:46 PM

I like your idea of using SQL to rename the variables. Since you said it was a comma delimited file, I save your example file as such and accomplished the task as follows:

proc import
    datafile="C:\art\samplefile-pop.csv"
    dbms=csv
    out=work.newfile
    replace;
  getnames=yes;
  datarow=2;
run;

proc sql noprint;
  select catx('=y',name,
    year(input(translate(substr(name,2),'/','_'),anydtdte.)))
     into :vnames separated by ' '
       from dictionary.columns
         where libname='WORK' and
               memname='NEWFILE' and
               first(name) eq '_'
  ;
quit;

data newfile;
  set newfile (rename=(&vnames.));
run;

HTH,

Art, CEO, AnalystFinder.com

KDS_1113 · Posted 02-19-2017 02:22 PM

Art297, thank you!

I'm a student and just learning SAS, so can you help me understand the proc sql statement you provided? It appears you're creating a macro (I'm just learnning macros and find them somewhat confusing). Can you explain what the sections with arrows are doing?

proc sql noprint;
select catx('=y',name, <<<---------
year(input(translate(substr(name,2),'/','_'),anydtdte.)))
into :vnames separated by ' '
from dictionary.columns <<<<--------
where libname='WORK' and
memname='NEWFILE' and <<<<----- what is memname?
first(name) eq '_' <<<---- is this calling on name created in line 2?

art297 · Posted 02-19-2017 02:54 PM