Hi All,
There are more than 200 columns in my excel file. I want the excel column names as sas variable names and the first
row as sas labels. for the attached excel file, I want A,B,C,D as my sas dataset variables and Name, sex,height,weight
as their labels.
Thank you very much for your help!
Linlin,
Since the SAS assigned names in such cases are systematic, as are the Excel column names, can't you achieve what you want by using KSharp's proposed method and a slight bit of post-processing?
The following can probably be optimized (both code-wize and using proc datasets rather than a datastep), but it represents one way of accomplishing such renames:
/* create a sample dataset */
data have;
retain f1-f250 (250*1);
output;
run;
/* Build a renames macro variable */
proc sql noprint;
select
case
when input(substr(name,2),8.) < 27 then
name||"="||byte(input(substr(name,2),8.)+64)
when mod(input(substr(name,2),8.),26) eq 0 then
name||"="||
byte(floor((input(substr(name,2),8.)-1)/26)+64)||
byte(26+64)
else name||"="||
byte(floor((input(substr(name,2),8.)-1)/26)+64)||
byte(mod(input(substr(name,2),8.),26)+64)
end
into :renames separated by " "
from
dictionary.columns
where libname="WORK" and
memname="HAVE"
;
quit;
/* Rename the variables */
data want;
set have (rename=(&renames.));
run;
I am afraid that you need to process it after importing.
There is an option getnames=no for proc import, but adding labels need you do it by hand.
Ksharp
Hi Ksharp,
Thank you for your reply. When using "getnames=no" option , sas variables will be F1,F2 ....I want the variables as A,B,C D ......
Linlin,
Since the SAS assigned names in such cases are systematic, as are the Excel column names, can't you achieve what you want by using KSharp's proposed method and a slight bit of post-processing?
The following can probably be optimized (both code-wize and using proc datasets rather than a datastep), but it represents one way of accomplishing such renames:
/* create a sample dataset */
data have;
retain f1-f250 (250*1);
output;
run;
/* Build a renames macro variable */
proc sql noprint;
select
case
when input(substr(name,2),8.) < 27 then
name||"="||byte(input(substr(name,2),8.)+64)
when mod(input(substr(name,2),8.),26) eq 0 then
name||"="||
byte(floor((input(substr(name,2),8.)-1)/26)+64)||
byte(26+64)
else name||"="||
byte(floor((input(substr(name,2),8.)-1)/26)+64)||
byte(mod(input(substr(name,2),8.),26)+64)
end
into :renames separated by " "
from
dictionary.columns
where libname="WORK" and
memname="HAVE"
;
quit;
/* Rename the variables */
data want;
set have (rename=(&renames.));
run;
Dear Art,
Thank you very very much!!! This is exactly what I want.
Hi,
I have a similar issue with getnames. I have to append the csv files to data sets on daily basis. I have a macro to do this using proc import. Works fine as of now. Strangely, Some of the files have column headers at the end of the csv (only for few days).
I would like to know if there is a way to read the column headers at the end of the files(dynamically, either first row of the file or last row of the file).
I know we can do this using infile, by creating my own variables. But, using infile, my code will be very lengthy, and I wont be able to use a macro (as each file structure is different). I need this to read only few files. Usually the column headers will be in the firstrow itself. Thans for your help in advance.
Here is my code for import step only.
proc import datafile=csvTest
out= work.&csvfile /*&libto..&csvfile*/
dbms = csv replace;
delimiter= ",";
/* mixed=yes ;*/
getnames=yes;
There is no such option for CSV engine.
But For XLS engine there is an option namerow= to define which row contains variable names.
Data Source | Syntax | Valid Value | Default Value | In PROC IMPORT? | In PROC EXPORT? |
XLS | ENDCOL | Last column for data | Last column that contains data | Yes | No |
ENDNAMEROW | Last row for variable names | Same as NAMEROW | Yes | No | |
ENDROW | Last row for data | Last row that contains data | Yes | No | |
GETNAMES | Yes|No | Yes | Yes | No | |
NAMEROW | First row for variable names | First row that contains variable names | Yes | No | |
NEWFILE | Yes|No | No | No | Yes | |
PUTNAMES | Yes|No | Yes | No | Yes | |
RANGE | NAME | SHEET$UL:LR | First row | Yes | No | |
SHEET | Sheet name | First sheet | Yes | Yes | |
STARTCOL | First column for data | Last column that contains data | Yes | No | |
STARTROW | First row for data | First row that contains data | Yes | No |
Ksharp
Hi Sharp,
Thanks a lot for the information. Unfortunaly, I have to deal with CSV engine for now. I will defenitely keep the XLS option in the future.
I was told that there is a way to get the names from the csv files dynamically whether they are in the first row or the last row(column positions may change in the future), using _infile_ i.e. by reading the columns header line
I was planning to do something like this in 2 steps. Please let me know if you can provide an example for me or if you can let me know if I am dping it correctly.
filename readcsv path; /*for unix envt*/
data _null_;
if 'ID' then do;
if index(_infile_, 'readcsv') then do;
call symput(fountit,_N_);
call symput("inputstatement",trim(_infile_));
end;
end;
run;
data test;
input &inputstatement;
if _n_ = &fountit then delete;
run;
Thanks in advance.
Here is code to input use PROC IMPORT and skip the first line with the column names.
proc import out=test file=tmpfile1 dbms=dlm replace ;
delimiter=',';
datarow=2;
getnames=No;
run;
Here is code to read the names from the first row and generate LABEL statements.
filename rename temp;
data _null_;
infile tmpfile1 truncover dlm=',' dsd ;
file rename ;
length label $256 ;
do var=1 by 1 until (label=' ');
input label @;
if label=' ' then stop;
put 'label var' var '=' label : $quote. ';';
end;
run;
Here is an example of using PROC DATASETS to apply the labels.
proc datasets lib=work nolist;
modify test;
%inc rename ;
run;
quit;
Now if you want the variables names to be A to Z, AA to AZ, ... like Excel column names then you could generate rename statments to rename VAR1 to A, VAR2 to B, .... VAR27 to AA, etc.
What you are doing is to judge which row is a name row.
But I need a sample file or sample data to test. What data you have ?
Ksharp
Hello Sharp,
I have several csv files all of them with dirrefent names, and files structure.
Proc import will not work in my case as some files have date_time in charectar.
All the files have date_time and id as common variable. In the future new columns may be added very rarely.
after reading the files, as id is numeric, I am getting missing value. So, I am removing by saying if id = . then delete;
I would like to know how read the name row dynamically, wheather names are first or last row. read the row before bringing the data to SAS, and append the names based on comma dlm to the records.
date_time,id,address,abcdef,aeiou,test
2012-02-01 12:34:55,1234,jkhuc,uhibddeiui,iouyefbyj
2012-03-12 12:34:55,1234,jfccuc,uhibweddsiui,iouyffbyj
2012-02-12 12:34:55,1234,jkxehuc,uhibiui,iouybyj
2012-01-11 12:34:55,1234,jkhdduc,uhibideui,iouyfrbyj
2012-01-22 12:34:55,1234,jkhqqduc,uhiefbiui,ioudfgybyj
I am not sure what you mean. If your files are in random formats then PROC IMPORT is your friend. Otherwise you will need to spend time for each file deciding how many variables it has, the names of the variables, the type and length of the variables. PROC IMPORT will do that for you and get it right well over 80% of the time.
Now if you want to take a CSV file like your example and write a program to read it here are the steps I do.
1) Copy the row with the variable names into my program. Remove the commas prefix with INPUT and add formats based on what I think the type of the variable.
2) Insert the INPUT statement into a data step like this one.
data new ;
infile 'new.txt' dsd dlm=',' truncover firstobs=2 lrecl=10000 ;
input .... ;
run;
3) For some variables you might need to also add FORMAT and INFORMAT statements.
Here is the data step for the example data you posted.
data new ;
infile 'new.txt' dsd dlm=',' truncover firstobs=2 lrecl=10000 ;
informat date_time anydtdtm.;
format date_time datetime.;
length address $100 abcdef aeiou test $20 ;
input date_time id address abcdef aeiou test ;
run;
Hello Tom,
Thanks a bunch for both your responses. Sorry if I was not clear with my Q. Actually I did use proc import, and it did work for ~ 75% of the files(in my case). I have used the following macro to read the names dynamically. some files have the column headers at the EOF, instead of first row. This is the reason I was looking for an example to read the variable names dynamically regardless of top or bottom row, based on a common variable in all the files(in my case ID and date_time ). I have found a similar post in the blog, and I just have accommodate my variable name, read the entire column header, and use the _infile_ to read them dynamically so that I can accommodate the changes in the file structure in future.
https://communities.sas.com/message/13449#13449
Mycode:
%let path = "/sasdata/path/";
%let libto = library;
options mprint;
%macro Readcsv (csvfile=);
%let foundit = %sysfunc(exist(&path.&csvfile..csv));
%if &foundit %then %do;
filename csvTest "&path.&csvfile..csv";
data _null_;
infile csvTest;
input;
if _N_ eq 2 then do;
/* call symput("hasRows", "yes");*/
stop;
end;
run;
%end;
proc import datafile=&path.&csvfile..csv
out=&libto..&csvfile replace;
delimiter= ",";
getnames=yes;
scan=yes
mixed=yes;
run;
/* %end;*/
%mend Readcsv;
%Readcsv (csvfile= file_xyz);
%Readcsv (csvfile= file_abc);
There are some charecter constraints to deal with using proc import, and that is why I am reading the files using infile, and defining their lengths, formats and informats.
I think you could code something like :
filename x 'c:\x.csv'; data _null_; infile x ; input date ?? yymmdd10. ; if missing(date) then do; if _n_ eq 1 then call symputx('start',_infile_); else call symputx('end',_infile_); stop; end; run; %put start=&start end=&end;
Ksharp
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.