Solved: Re: How to handle names generated when importing a file with headers i...

PoulRavn · Posted 02-09-2015 10:11 AM

Hi, I am using EG6.1. I sometimes import plain txt files, that contain data from line 2 onwards. In line 1 there are names of colums. I can chose the import step to recognized these as variable names. If these names are non-compliant with SAS variable names, they will be converted during the import. My problem is how to reference these in later data steps in regular SAS programs.

For instance a column gets this name: 'Auto H2O2 data'n. I am writing it exactly as it is mentioned in the properties tab seen when right clicking the column heading in the imported data in the EG61 environment. What is the right way to access this variable?

And in general where in the documentation is this discussed?

Regards

Poul Ravn Sørensen

gergely_batho · Posted 02-12-2015 05:07 AM

Hi Poul,

By default EG does not convert variable names to the "V7 format". The column name is exatly the copy from the first row.

You can reference those column names in a data step with the same syntax you use in your original post.

For example:

data want;

set have;/*result of import*/

'Auto H2O2 data'n= 'Auto H2O2 data'n + 1; /*using "named constant"to reference a column*/

run;

Does this code work for you? If not, what is the error message?

View solution in original post

Reeza · Posted 02-09-2015 10:22 AM

Look at option validvarname in the documentation.

I recommend changing your default in EG to V7 compares to present value of ANY.

PoulRavn · Posted 02-10-2015 02:15 AM

Hi Reeza, as you anticipated, the option was set to 'ANY', not V7. I tried to change it and run again an import step from an existing project. Now it gives lots of errors on account of the import step trying to name the columns according to the text (labels) in the first row of data in the flat file. (NB, it is not an Excel file, but a tab-delimited file, intended to imported into Excel originally, but for efficiency reasons now treated with SAS, which is much more powerful.)

So it comes back to the other part of the question: How to reference the variables that got names marked with 'my odd variable'n in the import step.

I tried to look in the documentation, but did not find the key for this , - yet.

Regards Poul

Kurt_Bremser · Posted 02-10-2015 02:49 AM

Drop the original names and use valid standard SAS ("V7") names.

Read the file manually in a data step with firstobs=2.

That way you avoid all the hassle with the 'some shit in here'n constructs. And you get the additional benefit that the data step may/will fail if the file structure changes unexpectedly.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PoulRavn · Posted 02-12-2015 03:07 AM

Thanks for suggesting this.

However then names well then be just F1, F2 etc. I would be nice to able to take advantage of the fact that the variable names are - almost - available.

Is there a way to read the first line only and then use the strings there as variable names, with edits when the names contain illegal characters? Of course I could just have a data step with lots of renames but still, - not quite so elegant.

Regards

Poul

Kurt_Bremser · Posted 02-12-2015 03:19 AM

You can do this to automate it:

Read the file, but only line 1 (obs=1).

Use the scan() function to extract the variable names from the line.

Use the translate function to get rid of the characters.

Then use call execute to generate a data step (that starts with firstobs=2) with the corrected variable names.

But you still need to get a grip on the problem of variable types and lengths.

Correctly documented infile formats save you all this. Sadly, tools like Excel prevent users from developing good data processing habits.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

PoulRavn · Posted 02-12-2015 04:18 AM

So, do you see a need for a macro that grabs the variable names from reading line 1 for re-use when reading from line 2? I mean, reading from line 2 requires an input statement, which must specify which variables to read.

Or will what you suggest be able to do this without as macro string transfer mechanism?

Poul

Kurt_Bremser · Posted 02-12-2015 04:32 AM

You write a data step that generates the data step:

data _null_;

call execute ("data want;infile 'path_to_infile' firstobs=2;input");

infile "path_to_infile" obs=1;

input;

do i = 1 to countw(_infile_,';');

varname = scan(_infile_,i,';');

varname = translate(varname,"_"," ");

call execute(varname);

end;

call execute(';run;');

run;

(This is just out of my head, so completely untested, but it documents the idea)

But I still recommend against such tries at automatics. Have a solid documentation of the infile, and import according to that.

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

gergely_batho · Posted 02-12-2015 05:07 AM

Hi Poul,

By default EG does not convert variable names to the "V7 format". The column name is exatly the copy from the first row.

You can reference those column names in a data step with the same syntax you use in your original post.

For example:

data want;

set have;/*result of import*/

'Auto H2O2 data'n= 'Auto H2O2 data'n + 1; /*using "named constant"to reference a column*/

run;

Does this code work for you? If not, what is the error message?

PoulRavn · Posted 02-12-2015 05:19 AM

Hi Gergely,

the suggested notation works. I can use the variable name as shown in an assignment state, where a new variable is equated with the one referenced 'xxxxx'n. So it is recognized. I think this will work in general.

Thanks

Poul

How to handle names generated when importing a file with headers in line 1?

Re: How to handle names generated when importing a file with headers in line 1?

Re: How to handle names generated when importing a file with headers in line 1?

Re: How to handle names generated when importing a file with headers in line 1?

Re: How to handle names generated when importing a file with headers in line 1?

Re: How to handle names generated when importing a file with headers in line 1?

Re: How to handle names generated when importing a file with headers in line 1?

Re: How to handle names generated when importing a file with headers in line 1?

Re: How to handle names generated when importing a file with headers in line 1?

Re: How to handle names generated when importing a file with headers in line 1?

Re: How to handle names generated when importing a file with headers in line 1?

Catch up on SAS Innovate 2026

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away