SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Provided Headers vs. Provided Dataset without Headers

Accepted Solution Solved
Reply
Contributor
Posts: 47
Accepted Solution

Provided Headers vs. Provided Dataset without Headers

[ Edited ]

Hi, 

 

Background:

I have a data set without headers (we'll call it "Why"). This dataset has 900+ variable columns. I unziped, and convereted the file to a .txt and successfully imported this data set into SAS EG. 

 

I then have a .txt file with 900+ headers (we'll call it "Useful"), containing values to be associated with the large "Why" data set variables.

 

Both files (Why, with 900 variable columns and about and 1.3 million rows), and (Useful, with 900 columns and 1 row), have been imported into SAS and are sitting as SAS files within my Work.directory.

 

Question:

What is the best way to attach the "Useful" file headers to the "Why" file without headers to the appropriate column? 

Why are headers not attached to datasets? 

 

TS


Accepted Solutions
Solution
‎11-16-2015 09:31 AM
Super User
Super User
Posts: 6,502

Re: Provided Headers vs. Provided Dataset without Headers

The answer is simple. Use a RENAME statement. Either in the program that reads the data or later using PROC DATASETS.

 

You can generate the rename statement from the list of variable names.

So let's assume that you have created a SAS dataset with variables named VAR1 to VAR900.

So you could read in the list of names and generate the rename statement with a simple data step.

filename rename temp;
data _null_;
   infile 'headers' end=eof;
   length oldname newname $32 ;
   input newname;
   file rename ;
   if _n_=1 then put 'RENAME';
   oldname=cats('VAR',_n_);
   put oldname '=' newname ;
   if eof then put ';' ;
run;

Now you can use that rename statement to change the names.

data fixed;
   set original;
%include rename ;
run;

 

View solution in original post


All Replies
Regular Contributor
Posts: 161

Re: Provided Headers vs. Provided Dataset without Headers

When you refer to headers, do you mean the header row that refers to the column names? Sorry if it's a basic question.....just trying to understand.

Kannan Deivasigamani
Contributor
Posts: 47

Re: Provided Headers vs. Provided Dataset without Headers

No prob, yes the header row that refers to the column names

Trusted Advisor
Posts: 1,115

Re: Provided Headers vs. Provided Dataset without Headers

[ Edited ]

Good question from Kannan. I was also wondering. Perhaps you mean variable label, i.e., a short description of a variable's content (e.g. "Patient number" for a variable named PATNO)?

 

After all, even if the "headers" found in the raw data are some kind of "names", would we be sure that these names happen to be valid SAS names? So, it would be safer to regard them just as "labels".

 

In this case, there were several answers posted to a similar question in the SAS-L mailing list, 15 years ago:

https://groups.google.com/forum/#!topic/comp.soft-sys.sas/awuwQAwP1AA

 

However, my impression was that your situation is slightly different.

 

There is also a 2008 paper "Dynamically Create Variable Labels from Data Set Values" which may be applicable, but I haven't checked that in detail either.

 

Instead, I've quickly written two macros, which seem to work at least for the test data I've created (26 columns, 13 rows). The first changes the labels of the variables in dataset WHY to the "headers" found in the 1-observation dataset USEFUL, the second changes the variable names correspondingly (assuming that the headers are in fact valid SAS variable names).

 

/* Macro to label variables in &DATA with labels found in &LABELS */

%macro lblvars(data=, labels=);

  /* Retrieve variable names */
  proc sql noprint;
  select name into :vn1 - :vn9999
    from dictionary.columns
    where libname='WORK' & memname=upcase("&data")
    order by varnum;
  quit;

  /* Retrieve variable labels */
  data _null_;
  set &labels;
  array lv[*] _character_;
  do i=1 to &sqlobs;
    call symput(cats('lbl',i), lv[i]);
  end;
  run;

  %put Now labeling &sqlobs variables ...;
   
  proc datasets nolist;
  modify &data;
  label 
  %do i=1 %to &sqlobs;
    &&vn&i="&&lbl&i"
  %end;;
  quit;

  %put ... Done.;
%mend lblvars;


/* Macro to rename variables in &DATA with new names found in &NAMES */

%macro renvars(data=, names=);

  /* Retrieve old variable names */
  proc sql noprint;
  select name into :vn1 - :vn9999
    from dictionary.columns
    where libname='WORK' & memname=upcase("&data")
    order by varnum;
  quit;

  /* Retrieve new variable names */
  data _null_;
  set &names;
  array nv[*] _character_;
  do i=1 to &sqlobs;
    call symput(cats('name',i), nv[i]);
  end;
  run;

  %put Now renaming &sqlobs variables ...;
   
  proc datasets nolist;
  modify &data;
  rename 
  %do i=1 %to &sqlobs;
    &&vn&i=&&name&i
  %end;;
  quit;

  %put ... Done.;
%mend renvars;


/* Create test data */

%let nvar=26;

data why;
array v[&nvar];
do i=1 to 13;
  do _n_=1 to &nvar;
    v[_n_]=ranuni(31416);
  end;
  output;
end;
drop i;
run;

data useful;
array hdr[&nvar] $40;
do _n_=1 to &nvar;
  hdr[_n_]=byte(64+_n_);
end;
run;

proc contents data=why;
run;


/* Apply the macros and see how meta data change */

%lblvars(data=why, labels=useful)

proc contents data=why;
run;

%renvars(data=why, names=useful)

proc contents data=why;
run;

Of course, this is just unvalidated draft code and it comes with no warranties whatsoever. In particular, the macros might fail if the headers are invalid labels or names, respectively.

 

Obviously, it is crucial that the n-th column of USEFUL contains the header for the n-th column of WHY for all n=1, ..., 900+.

 

As to your second question, I agree that it should be easier to retrieve variable names and labels from raw data.

 

Edit: For delimited raw data files such as CSV files there is in fact a way to retrieve variable names from the first record of the file: see PROC IMPORT, GETNAMES statement

 

Solution
‎11-16-2015 09:31 AM
Super User
Super User
Posts: 6,502

Re: Provided Headers vs. Provided Dataset without Headers

The answer is simple. Use a RENAME statement. Either in the program that reads the data or later using PROC DATASETS.

 

You can generate the rename statement from the list of variable names.

So let's assume that you have created a SAS dataset with variables named VAR1 to VAR900.

So you could read in the list of names and generate the rename statement with a simple data step.

filename rename temp;
data _null_;
   infile 'headers' end=eof;
   length oldname newname $32 ;
   input newname;
   file rename ;
   if _n_=1 then put 'RENAME';
   oldname=cats('VAR',_n_);
   put oldname '=' newname ;
   if eof then put ';' ;
run;

Now you can use that rename statement to change the names.

data fixed;
   set original;
%include rename ;
run;

 

Contributor
Posts: 47

Re: Provided Headers vs. Provided Dataset without Headers

Hi, I believe your solution is what I need but is coded in a way I don't quite follow. 

I've outlined what I think you are doing... would you mind elaborating? 

filename rename temp;              (I assume this is renaming the temp table I'm creating)
data _null_;                       (don't create table when executing the following functions)
   infile 'headers' end=eof;       (read the file 'headers into SAS)
   length oldname newname $32 ;    (create two new variables 'oldname' & 'newname' length $32)
   input newname;                  (specify perviously created variable 'newname'?)
   file rename ;                   (Here is the rename statement but is what follows what is being defined as my old value and new value?)
   if _n_=1 then put 'RENAME';     (If the first time the data step has iterated = 1 then....what is put 'RENAME' doing?)
   oldname=cats('VAR',_n_);
   put oldname '=' newname ;
   if eof then put ';' ;
run;
Super User
Super User
Posts: 6,502

Re: Provided Headers vs. Provided Dataset without Headers

[ Edited ]

This step just creates a file with the rename statement. The syntax for a rename statement is: RENAME old=new old=new .... ;

 

* Create a temporary file to store the generated code ;
* Use the filref of RENAME to refer to it.;
filename rename temp; 

* Run a datastep without any output datasets ;
data _null_;  

* Read from the text file that lists the headers. Set EOF flag ;
   infile 'headers' end=eof;   

* Define two variables to use to store the names. SAS names are limited to 32 bytes. ;
   length oldname newname $32 ;  

* Read the next name from the input file. ;
   input newname; 

* Generate the next dummy name by concatenating the loop counter to the string 'VAR' ;
   oldname=cats('VAR',_n_);

* Specity that PUT statement output will go the temporary file instead of the log ;    
   file rename ;

* First time through write the beginning of the rename statement ; 
   if _n_=1 then put 'RENAME';    

* Write the old=new pair for this name ;
   put oldname '=' newname ;

* When at the end of the input then write the semi-colon to finish the  rename statement. ;
   if eof then put ';' ;

* End of the data step definition ;
run;

 

Contributor
Posts: 47

Re: Provided Headers vs. Provided Dataset without Headers

Hi, thank you very much for talking the time to walk me through the code. If I could ask a couple more questions, 

 

the "infile" statement is what is causing some confusion as I'm getting the error, "not found within the current file path" and it's trying to find my file somewhere other than my "SAS.WORK" directory. The file I am reading from, containing my headers, is a temporary file sitting in my work directory.  I checked out other portions of your code by doing some reading and running them seperatly. The concatenation of string "var" with the counter makes sense....

 

The other part is we seem to be defining "oldname". I assume (input newnameSmiley Wink is where we are reading the variable within my "headers" table as the "newname" to be swapped for "oldname".  I think the rest makes sense. 

Super User
Super User
Posts: 6,502

Re: Provided Headers vs. Provided Dataset without Headers

[ Edited ]

GalacticAbacus wrote:

Hi, thank you very much for talking the time to walk me through the code. If I could ask a couple more questions, 

 

the "infile" statement is what is causing some confusion as I'm getting the error, "not found within the current file path" and it's trying to find my file somewhere other than my "SAS.WORK" directory. The file I am reading from, containing my headers, is a temporary file sitting in my work directory.  I checked out other portions of your code by doing some reading and running them seperatly. The concatenation of string "var" with the counter makes sense....

 

The other part is we seem to be defining "oldname". I assume (input newnameSmiley Wink is where we are reading the variable within my "headers" table as the "newname" to be swapped for "oldname".  I think the rest makes sense. 


It is extremely unlikely that the default directory is the directory that SAS creates automatically for the WORK library.  It would normally be the current working directory when the SAS command started running. Or you can change it by running the operarting system's cd command. If the file you want want to read is not in the current working directory then just include the fully quailified name of the file inside the quotes. If the file actual is in your work directory you can use the PATHNAME() function to find out where the WORK library is pointing. So you could change the INFILE to something like this: 

INFILE "%sysfunc(pathname(work))/headers.txt" end=eof ;

But that is very unlikely to be what you mean. It is more likely that you mean you have already read the headers into a SAS dataset in the WORK library.  If it is a dataset with one observation per header name then just replace the INFILE and INPUT statements with a SET statement. You will also not need the variable NEWNAME as you can just use the variable that already has the names instead.  The SET statement also supports the END= option so that you can tell when you get to the end of the list.

* Replace these ;
* INFILE 'header' end=eof' ;
* input newname ;
* With  ;
SET headers END=EOF ;

If the names of the variables that you created when you read the file without the headers are not VAR1 to VAR900 then you will need a similar list of variable names so that you know which old name goes with which new name.  If you don't have such a dataset you can generate a dataset with one observation per variable stored in a varaible named _NAME_ by using proc transpose.  So if your existing dataset that you want to rename the variables for is named WANT then you can make a dataset named OLDNAMES using this procedure call.

proc transpose data=want(obs=0) out=oldnames ;
  var _all_;
run;

 

Contributor
Posts: 47

Re: Provided Headers vs. Provided Dataset without Headers

[ Edited ]

Hi, I think I am almost there, you were correct in that I ment I had a SAS dataset in my WORK library.

I have two datasets, both I've read into SAS as SAS datasets within my WORK library.

<> Dataset_1 = Work.Headerfile, with one variable "Name Convention" which is a list of my headers.

<> Dataset_2 = Work.Set_1, with 900+ variables and 1.2 million rows. Each variable data is under a header F"n", or F1, F2, F3, etc. out to F900.

 

Here is the alteration to the code: This ran, I'm working to see if it did what I wished. 

 

filename rename temp;
data _null_;
Set HEADER_FILE end=eof;
length oldname Name_Convention $32;
oldname = cats('F',_n_);
file rename;
if _n_ = 1 THEN PUT Name_Convention;
put oldname=Name_Convention;
if eof then put ';';
RUN;

 

While it seemed to run and if I replaced _null_ with the an output table name, the resulting data looked correct... (two variables, one with my desired header names, the other with variable "oldname" looking like F1 - F900... I then proceeded with 

 

DATA fixed;
set Set_1;
%include rename ;
run;

 

It ran, but did not complete came up with ERROR 180-322: Statement is not valid or it is used out of proper order.

I the length of "Name Convention" is also already set so I am unable to apply format $32.

Super User
Super User
Posts: 6,502

Re: Provided Headers vs. Provided Dataset without Headers

You need to change:

if _n_ = 1 THEN PUT Name_Convention;

to

if _n_ = 1 THEN PUT 'RENAME';

So that it will write the text RENAME to the file. This the beginning of the RENAME command.

 

You need to change:

put oldname=Name_Convention;

to

 

put oldname '=' Name_Convention;

 so that the equal sign is written to the file.  By not quoting the text literal you have asked SAS to write the variable OLDNAME using named format. So of oldname was 'F1' and Name_Convention was 'ID' it would end up witing something  'OLDNAME=F1 ID' to the file instead of 'F1=ID'.

 

If you want you can run this step to read in the file you created and dump it to the log so you can see what is in it.

data _null_;
   infile rename ;
   input;
   put _infile_;
run;

 

Contributor
Posts: 47

Re: Provided Headers vs. Provided Dataset without Headers

Hi, thank you very much for talking the time to walk me through the code. If I could ask a couple more questions, 

 

the "infile" statement is what is causing some confusion as I'm getting the error, "not found within the current file path" and it's trying to find my file somewhere other than my "SAS.WORK" directory. The file I am reading from, containing my headers, is a temporary file sitting in my work directory.  I checked out other portions of your code by doing some reading and running them seperatly. The concatenation of string "var" with the counter makes sense.

 

The other part is we seem to be defining "oldname". I assume (input newnameSmiley Wink is where we are reading the variable within my "headers" table as the "newname" to be swapped for "oldname".  I think the rest makes sense. 

Trusted Advisor
Posts: 1,115

Re: Provided Headers vs. Provided Dataset without Headers

Hi @GalacticAbacus,

I'm just curious: Did you give my RENVARS macro a try? It was tailored to the situation you described in your initial post, i.e. two datasets with the same number of variables (900+). Tom's elegant original solution assumed that the "headers" data are a .txt file. Moreover, it required that the variable names of your dataset "WHY" are VAR1, VAR2, etc., whereas macro RENVARS does not make such an assumption.

Contributor
Posts: 47

Re: Provided Headers vs. Provided Dataset without Headers

Hi Freelance! 

 

I gravitated toward Tom's solution because I really enjoy learning exactly why and how something works before I apply it, and while I'm grateful for your taking the time to write the macro solution, it looked a little beyond my current abilities to undertand. I am grateful for the example thought. 

Regular Contributor
Posts: 161

Re: Provided Headers vs. Provided Dataset without Headers

Here is another way that might be simpler depending on what you have.......with a PROC SQL.   The IN dataset is what has the headers. The "test" dataset does not have proper headers but has names like v1, v2, v3, v4, v5.  This data is inserted into IN as shown below....

 

DATA IN;
	 SET SASHELP.CLASS(OBS=1);
RUN;

data test;
	infile datalines dlm='|';
	input  V1:$8. V2:$1. V3:4. V4:4. V5:4.;
datalines;
Alice|F|13|56.5|84.0
Barbara|F|13|65.3|98.0
;
RUN;
PROC SQL;
	INSERT INTO IN SELECT * FROM TEST;
QUIT;

PROC PRINT DATA=IN;TITLE '==IN';
PROC PRINT DATA=TEST;TITLE '==TEST';

Please note that this involves I/O read/writes but is a simple INSERT statement interms of coding.

 

The output results shows this below:

 

==IN
Obs	Name	Sex	Age	Height	Weight
1	Alfred	M	14	69.0	112.5
2	Alice	F	13	56.5	84.0
3	Barbara	F	13	65.3	98.0
==TEST
Obs	V1	V2	V3	V4	V5
1	Alice	F	13	56.5	84
2	Barbara	F	13	65.3	98

Hope this helps...!!!

Kannan Deivasigamani
☑ This topic is SOLVED.

Need further help from the community? Please ask a new question.

Discussion stats
  • 18 replies
  • 2406 views
  • 1 like
  • 4 in conversation