BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
KDS_1113
Obsidian | Level 7

Morning - 

 

I have a .csv data set (not created by me) that contains city, state and then population by year (i.e., 1/1/2015, 1/1/2016, etc...).  When I import the data into sas using the infile statement, it coverts the headers to _1_1_2015, _1_1_2016.  I would prefer my column headers to be something like Y2015, Y2016, etc.  As a way to get around this, I condsidered NOT importing the column headers and just later renaming them, but I'm wondering there's a better way to go about this.

 

Here's an example of my code: 

proc import
datafile="C:\example.csv"
dbms=csv
out=work.newfile
replace;
getnames=no;
datarows=2;
run;

 

Next, I thought I could create a new table using a proc sql statement and renaming each of the date columns.  I imagine there has to be a better way to go about this.  Any advice?

1 ACCEPTED SOLUTION

Accepted Solutions
art297
Opal | Level 21

I'll answer your questions in reverse.

 

first(name) eq '_'  <<<---- is this calling on name created in line 2? No. I describe NAME, below.

 

from dictionary.columns <<<<--------

where libname='WORK' and
memname='NEWFILE' and <<<<-----  what is memname?

 

dictionary.columns is a table that contains all of the column headers (i.e., variable names) for every sas dataset that is identified with a libname. One of its fields is called NAME, which holds the variable names.

MEMNAME is the field that contains the dataset names and libname is the field that contains the library names. Their values, in dictionary.columns, are ALL UPPERCASE. So, in this case, the code is getting the NAMEs of all of the variables in work.newfile

 

So, going back to first(name) eq '_', FIRST is a function that captures the first character of a variable's value. So, in this case, the where statement is selecting only those variables from work.newname whose variable name starts with a '_' character (namely, your date columns).

 

select catx('=y',name,  <<<---------
year(input(translate(substr(name,2),'/','_'),anydtdte.)))
into :vnames separated by ' '

 

Catx is a function that concatenates strings together, separating them by the first string listed. The into :vnames separated by ' ' creates a macro variable. So, the full statement is taking each name that begins with an underscore (i.e., _2000_01_01, _01_01_2001, _2002_01_01, _2003_01_01, _2004_01_01, _2005_01_01, _2006_01_01 and_2007_01_01), and converting them into one long string, namely _2000_01_01=y2000 _2001_01_01=y2001 _2002_01_01=y2002 _2003_01_01=y2003 _2004_01_01=y2004 _2005_01_01=y2005 _2006_01_01=y2006 _2007_01_01=y2007

 

The translate function is used to convert the underscores into forward slashes so that the anydtdte. informat will read the values as dates. The statement starts with the year function which extracts the years from the values.

 

The macro variable that is created simply contains the above string, which just happens to be in the exact same format that the recode option calls for.

 

HTH,

Art, CEO, AnalystFinder.com

 

View solution in original post

11 REPLIES 11
Reeza
Super User

Transpose to a long format and then use scan to extract the year. 

If you really want a wide format, transpose back to a long format using proc transpose and ID variable as the Year variable. 

Shmuel
Garnet | Level 18

How many years' population do you have in the csv file ? 

Can you post a sample of your csv file - just few rows to be used a test file ?

 

You wrote: When I import the data into sas using the infile statement, it coverts the headers to _1_1_2015

I believe the header is created by PROC IMPORT and not throght data step with INFILE statement.

Using infile you have better control on variable names (headers) and data format.

 

You can adapt next code to your needs:

/* filename pop '...path and name.csv'; */
data want;
    infile datalines /* pop */ dlm=',' truncover firstobs=2;
    input city $ state $ pop2015 - pop2017;
    format pop: comma15.;
datalines;
CITY, state,01/01/2015,01/01/2016,01/01/2017
city name a, st1, 200000, 200200, 202,500
mini town, st2, 5000, 4899, 5005
capital, vw, 10200500, 10220000, 101900800
;
run;

 

Kurt_Bremser
Super User

proc import writes the data step it creates to the log. Copy that from there and replace the column names.

If you need that process automated, you can read the original column names from the first line and then create a data step dynamically with call execute.

Please post some example lines from the csv.

KDS_1113
Obsidian | Level 7

Thanks all -

 

Attached is a sample of my data.

Kurt_Bremser
Super User

THAT is not a csv.

Please post the top three lines of the original csv file as text.

An Excel file (aside from the fact that it could contain malware) is utterly useless for writing a data step to import a csv. That should be obvious.

KDS_1113
Obsidian | Level 7

I saved the sample file as an excel file in error.  The original IS a csv.  Attached the first 3 lines of the original csv saved as text.

Shmuel
Garnet | Level 18

Attached is the code to be done in one step. Just adapt macro variables on top to desired years.

You can also change the prefix from POP to Y if you like.

filename pop '/folders/myshortcuts/My_Folders/flat/samplefile-pop.csv'; 
%let from_year = 2000;
%let upto_year = 2007;
data want;
    infile pop truncover firstobs=2;
    input a_line $120.;
    
    city = scan(a_line,1,',');
    state = scan(a_line,2,',');
    
    pos = index(a_line,strip(state)); 
 put city= state=  pos=;
    pos = pos + lengthn(state) +2; 
    popx = compress(substr(a_line,pos),','); put pos= popx=;
    
    array pop pop&from_year - pop&upto_year;
    
    do i=1 to 8;
       vx = scan(popx,i,'"'); put i= vx=;
       pop(i) = input(vx,comma12.);
    end;
    keep city state pop&from_year - pop&upto_year;
run;
art297
Opal | Level 21

I like your idea of using SQL to rename the variables. Since you said it was a comma delimited file, I save your example file as such and accomplished the task as follows:

 

proc import
    datafile="C:\art\samplefile-pop.csv"
    dbms=csv
    out=work.newfile
    replace;
  getnames=yes;
  datarow=2;
run;

proc sql noprint;
  select catx('=y',name,
    year(input(translate(substr(name,2),'/','_'),anydtdte.)))
     into :vnames separated by ' '
       from dictionary.columns
         where libname='WORK' and
               memname='NEWFILE' and
               first(name) eq '_'
  ;
quit;

data newfile;
  set newfile (rename=(&vnames.));
run;

HTH,

Art, CEO, AnalystFinder.com

 

KDS_1113
Obsidian | Level 7

Art297,  thank you!

 

I'm a student and just learning SAS, so can you help me understand  the proc sql statement you provided? It appears you're creating a macro (I'm just learnning macros and find them somewhat confusing).  Can you explain what the sections with arrows are doing?

 

proc sql noprint;
select catx('=y',name,  <<<---------
year(input(translate(substr(name,2),'/','_'),anydtdte.)))
into :vnames separated by ' '
from dictionary.columns <<<<--------
where libname='WORK' and
memname='NEWFILE' and <<<<-----  what is memname?
first(name) eq '_'  <<<---- is this calling on name created in line 2?

 

 

art297
Opal | Level 21

I'll answer your questions in reverse.

 

first(name) eq '_'  <<<---- is this calling on name created in line 2? No. I describe NAME, below.

 

from dictionary.columns <<<<--------

where libname='WORK' and
memname='NEWFILE' and <<<<-----  what is memname?

 

dictionary.columns is a table that contains all of the column headers (i.e., variable names) for every sas dataset that is identified with a libname. One of its fields is called NAME, which holds the variable names.

MEMNAME is the field that contains the dataset names and libname is the field that contains the library names. Their values, in dictionary.columns, are ALL UPPERCASE. So, in this case, the code is getting the NAMEs of all of the variables in work.newfile

 

So, going back to first(name) eq '_', FIRST is a function that captures the first character of a variable's value. So, in this case, the where statement is selecting only those variables from work.newname whose variable name starts with a '_' character (namely, your date columns).

 

select catx('=y',name,  <<<---------
year(input(translate(substr(name,2),'/','_'),anydtdte.)))
into :vnames separated by ' '

 

Catx is a function that concatenates strings together, separating them by the first string listed. The into :vnames separated by ' ' creates a macro variable. So, the full statement is taking each name that begins with an underscore (i.e., _2000_01_01, _01_01_2001, _2002_01_01, _2003_01_01, _2004_01_01, _2005_01_01, _2006_01_01 and_2007_01_01), and converting them into one long string, namely _2000_01_01=y2000 _2001_01_01=y2001 _2002_01_01=y2002 _2003_01_01=y2003 _2004_01_01=y2004 _2005_01_01=y2005 _2006_01_01=y2006 _2007_01_01=y2007

 

The translate function is used to convert the underscores into forward slashes so that the anydtdte. informat will read the values as dates. The statement starts with the year function which extracts the years from the values.

 

The macro variable that is created simply contains the above string, which just happens to be in the exact same format that the recode option calls for.

 

HTH,

Art, CEO, AnalystFinder.com

 

KDS_1113
Obsidian | Level 7
Thank you for taking the time to answer my questions in detail.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 11 replies
  • 4406 views
  • 5 likes
  • 5 in conversation