Hi, I am using EG6.1. I sometimes import plain txt files, that contain data from line 2 onwards. In line 1 there are names of colums. I can chose the import step to recognized these as variable names. If these names are non-compliant with SAS variable names, they will be converted during the import. My problem is how to reference these in later data steps in regular SAS programs.
For instance a column gets this name: 'Auto H2O2 data'n. I am writing it exactly as it is mentioned in the properties tab seen when right clicking the column heading in the imported data in the EG61 environment. What is the right way to access this variable?
And in general where in the documentation is this discussed?
Regards
Poul Ravn Sørensen
Hi Poul,
By default EG does not convert variable names to the "V7 format". The column name is exatly the copy from the first row.
You can reference those column names in a data step with the same syntax you use in your original post.
For example:
data want;
set have;/*result of import*/
'Auto H2O2 data'n= 'Auto H2O2 data'n + 1; /*using "named constant"to reference a column*/
run;
Does this code work for you? If not, what is the error message?
Look at option validvarname in the documentation.
I recommend changing your default in EG to V7 compares to present value of ANY.
Hi Reeza, as you anticipated, the option was set to 'ANY', not V7. I tried to change it and run again an import step from an existing project. Now it gives lots of errors on account of the import step trying to name the columns according to the text (labels) in the first row of data in the flat file. (NB, it is not an Excel file, but a tab-delimited file, intended to imported into Excel originally, but for efficiency reasons now treated with SAS, which is much more powerful.)
So it comes back to the other part of the question: How to reference the variables that got names marked with 'my odd variable'n in the import step.
I tried to look in the documentation, but did not find the key for this , - yet.
Regards Poul
Drop the original names and use valid standard SAS ("V7") names.
Read the file manually in a data step with firstobs=2.
That way you avoid all the hassle with the 'some shit in here'n constructs. And you get the additional benefit that the data step may/will fail if the file structure changes unexpectedly.
Thanks for suggesting this.
However then names well then be just F1, F2 etc. I would be nice to able to take advantage of the fact that the variable names are - almost - available.
Is there a way to read the first line only and then use the strings there as variable names, with edits when the names contain illegal characters? Of course I could just have a data step with lots of renames but still, - not quite so elegant.
Regards
Poul
You can do this to automate it:
Read the file, but only line 1 (obs=1).
Use the scan() function to extract the variable names from the line.
Use the translate function to get rid of the characters.
Then use call execute to generate a data step (that starts with firstobs=2) with the corrected variable names.
But you still need to get a grip on the problem of variable types and lengths.
Correctly documented infile formats save you all this. Sadly, tools like Excel prevent users from developing good data processing habits.
So, do you see a need for a macro that grabs the variable names from reading line 1 for re-use when reading from line 2? I mean, reading from line 2 requires an input statement, which must specify which variables to read.
Or will what you suggest be able to do this without as macro string transfer mechanism?
Poul
You write a data step that generates the data step:
data _null_;
call execute ("data want;infile 'path_to_infile' firstobs=2;input");
infile "path_to_infile" obs=1;
input;
do i = 1 to countw(_infile_,';');
varname = scan(_infile_,i,';');
varname = translate(varname,"_"," ");
call execute(varname);
end;
call execute(';run;');
run;
(This is just out of my head, so completely untested, but it documents the idea)
But I still recommend against such tries at automatics. Have a solid documentation of the infile, and import according to that.
Hi Poul,
By default EG does not convert variable names to the "V7 format". The column name is exatly the copy from the first row.
You can reference those column names in a data step with the same syntax you use in your original post.
For example:
data want;
set have;/*result of import*/
'Auto H2O2 data'n= 'Auto H2O2 data'n + 1; /*using "named constant"to reference a column*/
run;
Does this code work for you? If not, what is the error message?
Hi Gergely,
the suggested notation works. I can use the variable name as shown in an assignment state, where a new variable is equated with the one referenced 'xxxxx'n. So it is recognized. I think this will work in general.
Thanks
Poul
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Check out this tutorial series to learn how to build your own steps in SAS Studio.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
