BookmarkSubscribeRSS Feed
Tom
Super User Tom
Super User

You are missing a step in your description. 

Once you have the metadata available in a usable form you need to use that information to drive the process of reading the actual data.  That process will vary depending on the software you are using to read in the CSV file.  In SAS you would want to use the information to write a data step.

 

So first you need to decide what type of data step you want to generate.  Then you can plan how to generate it from the metadata.

 

So for example for SASHELP.CARS you might want to generate a data step that looks like what I showed before:

data work.cars (label='2004 Car Data');
  infile "cars.csv" dsd dlm='|' truncover;
  length Make $13 Model $40 Type $8 Origin $6 DriveTrain $5
    MSRP 8 Invoice 8 EngineSize 8 Cylinders 8 Horsepower 8 MPG_City  8
    MPG_Highway 8 Weight 8 Wheelbase 8 Length 8
  ;
  informat MSRP comma. Invoice comma. ;
  format MSRP dollar8. Invoice dollar8. ;
  label EngineSize='Engine Size (L)' MPG_City='MPG (City)'
    MPG_Highway='MPG (Highway)' Weight='Weight (LBS)'
    Wheelbase='Wheelbase (IN)' Length='Length (IN)'
  ;
  input Make -- Length ;
run;

So if you have a dataset with MEMNAME MEMLABEL TYPE LENGTH INFORMAT FORMAT and LABEL you could use it to write such a program using a data step like this:

filename code temp;
data _null_;
  file code ;
* Write data statement and INFILE statement ;
  put 'data ' memname '(label=' memlabel :$quote. ')' 
     /  '  infile "myfile.csv" dsd truncover firstobs=2;'
  ;
* Write LENGTH statement ;
  put 'length ' @;
  do p=1 to nobs;
     set metadata point=p nobs=nobs;
     put name @;
     if type='String' then put '$' @;
     put length @;
   end;
   put ';' ;
* Write informat statement ;
   put 'informat ' @;
   do p=1 to nobs;
     set metadata point=p ;
     if informat ne ' ' then put name informat @;
   end;
   put ';' ;
* Write format statement ;
   put 'format ' @;
   do p=1 to nobs;
     set metadata point=p ;
     if format ne ' ' then put name format @;
   end;
   put ';' ;
* Write label statement ;
   put 'label' @;
   do p=1 to nobs;
     set metadata point=p ;
     if label ne ' ' then put name label :$quote. @;
   end;
   put ';' ;
* Write input statement ;
   p=1;
   set metadata point=p;
   put 'input ' name '-- ' @ ;
   p=nobs;
   set metadata point=p;
   put name ';' ;
* Write run statement ;
   put 'run;' ;
run;

And once you have the code in a file you can use %INCLUDE to run it.

%include code / source2;

Or perhaps you would rather generate a data step that looks like this;

data work.cars (label='2004 Car Data');
  infile "cars.csv" dsd dlm='|' truncover;
  attrib Make length=$13 ;
  attrib Model length=$40 ;
  attrib Type length=$8 ;
  attrub Origin length=$6;
  attrib DriveTrain length=$5;
  attrib MSRP length=8 informat=comma. format=dollar8. ;
  attrib Invoice length=8 ;
  attrib EngineSize length=8 label='Engine Size (L)' ;
/* fill in the rest of them */
  input Make -- Length ;
run;

Which would be simpler to create via a data step like

filename code temp;
data _null_;
  file code ;
  set metadata end=eof;
  if _n_=1 then do;
    firstvar=name;
    retain firstvar ;
    put
   'data ' memname '(label=' memlabel :$quote. ');'
  /'  infile "myfile.csv" dsd truncover firstobs=2;'
   ;
  end;
  put 'attrib ' name length= @;
  if format ne ' ' then put format= @;
  if informat ne ' ' then put informat= @;
  if label ne ' ' then put label = :$quote. @;
  put ';' ;
  if eof then put 
   '  input ' firstvar '-- ' name ';'
/ 'run;'
  ;
run;

You will need to work out the details.  For example if you have TYPE and LENGTH as two variables then you need to have logic in the code generation step to convert that in the SAS syntax you need to generate.  Like I showed in the first example 

     if type='String' then put '$' @;
     put length @;

But if you have length with values like $6 and 8 then you can just use the value directly as the code snippet you need to generate, like I did in the second example.

 

And if you are trying to read the CSV with python or R or some other future language you would need different code generation.

 

Season
Barite | Level 11

I am running your first code and am running into troubles. First of all, despite reading SAS Help, I am still confused on the meaning of the FILENAME and FILE statement. What does "code" that immediately follow the two words mean here? Can it be replaced by something else? Secondly, SAS reported that the variable memname was assigned to be both numeric and character, which is not the case in my metadata SAS dataset, which was imported into SAS from a spreadsheet in the first place.

Tom
Super User Tom
Super User

A FILENAME statement is used to define a FILEREF, that is a nickname you can use to point to an actual file.

A FILE statement is used to tell the DATA step what file to read from.

 

The FILENAME statement I used

filename code temp;

would define the fileref CODE pointing to a temporary file. Essentially a file in the WORK directory that SAS will delete when you end your SAS session.

 

Then the FILE statement will use that fileref instead of a quoted physical filename.

 

When you say "imported into SAS from a spreadsheet" do you mean you used PROC IMPORT to read from an XLSX file?  If so then the only reason MEMNAME would be numeric in that dataset would be if none of the cells where character strings.   In which case the issue is in the data step.  When compiling a data step SAS will define the TYPE of a variable as soon as it has to.  Usually when it first SEES the variable.  If SAS cannot tell from the context that the variable should be character it will define it as numeric.  

 

Is it possible your metadata does not have a variable named MEMNAME? 

Season
Barite | Level 11

Thank you for your response! I further explored your code and found that the FILE and FILENAME statements indeed produces a temporary file in the SAS Temporary Files dataset but is not executed immediately after running the DATA step. The %include statement causes the immediate execution. This process, from my persepctive, is like first producing a jack-in-a-box with the DATA step and then use the %include the statement to open it.


@Tom wrote:

When you say "imported into SAS from a spreadsheet" do you mean you used PROC IMPORT to read from an XLSX file?  If so then the only reason MEMNAME would be numeric in that dataset would be if none of the cells where character strings.   In which case the issue is in the data step.  When compiling a data step SAS will define the TYPE of a variable as soon as it has to.  Usually when it first SEES the variable.  If SAS cannot tell from the context that the variable should be character it will define it as numeric.  

 

Is it possible your metadata does not have a variable named MEMNAME? 


Yes, I did create the dataset named metadata with Excel. Strange though, the error was stubborn despite a check in the imported metadata dataset that proved that MEMNAME was indeed character instead of numeric. I therefore had to assign a character format to MEMNAME in the DATA step code like this:

filename code temp;
data _null_;
set metadata;
  file code ;
* Write data statement and INFILE statement ;
  put 'data ' memname:$60. '(label=' memlabel :$quote. ');' 
       '  infile "cars.csv" dsd truncover firstobs=2;'/*Replace the address of the CSV with a real one in the case of importation*/
  ;
* Write LENGTH statement ;
  put 'length ' @;
  do p=1 to nobs;
     set metadata point=p nobs=nobs;
     put name @;
     if type='string' then put '$' @;
     put length @;
   end;
   put ';' ;
* Write informat statement ;
   put 'informat ' @;
   do p=1 to nobs;
     set metadata point=p ;
     if informat ne ' ' then put name informat @;
   end;
   put ';' ;
* Write format statement ;
   put 'format ' @;
   do p=1 to nobs;
     set metadata point=p ;
     if format ne ' ' then put name format @;
   end;
   put ';' ;
* Write label statement ;
   put 'label ' @;
   do p=1 to nobs;
     set metadata point=p ;
     if label ne ' ' then put name '=' label :$quote. @;
   end;
   put ';' ;
* Write input statement ;
   p=1;
   set metadata point=p;
   put 'input ' name '-- ' @ ;
   p=nobs;
   set metadata point=p;
   put name ';' ;
* Write run statement ;
   put 'run;' ;
   stop;
run;
%include code / source2;

In addition, I found the LABEL statement lacking an equal sign and added one in the code. In addition, I found SAS keep running the code endlessly without a STOP statement at the end. The addition leads to a successful importation.

Still, I would still like to ask a few questions on formats. It can be seen from the code that when using the PUT statement to execute commands, a colon (:) has to be added between the name of format and the name of variable. However, if ordinary DATA steps, this is not required (e.g., Make $13 instead of Make: $13.). I wonder if my summarization is correct and if there are other instances where a colon is needed. Much appreciation is given if you could elaborate on the reason why such a difference exists.

In addition, a dot has to be appended to the quote. format while those seem to be not necessarily needed for character formats with predefined length (e.g., $60 and $60. are both OK) as well as numeric formats. Could you please give a reason for this? In addition, are the formats with and without a following dot at the end of their names really the same or they are different yet this case cannot reveal this difference?

Tom
Super User Tom
Super User

Adding the format to the PUT statement will allow the data step to make a better guess about how to define MEMNAME.  But that will not solve the issue that MEMNAME has not yet gotten a value at that point in the data step.

 

The problem is that the PUT statement that uses MEMNAME is BEFORE the SET statement.  So the data step compiler guesses that you wanted MEMNAME as numeric, and then when the first SET statement runs there is a conflict.

 

For that particular data step the solution would be to make sure the SET statement is before the PUT statement.  Not only will that fix the type conflict it will also mean that MEMNAME actually has a value that can be written by the PUT statement.

 

You could add another SET statement at the top of the step before the first PUT.  But then the data step would iterate once for each observation in the dataset, writing multiple copies of the data step to the output file.   To fix that you could add a STOP statement to the end of the data step so that it stops after the first iteration.  Or you could add an OBS=1 dataset option to the SET statement. In that case the data step would stop at the top of the second iteration when it tries to read a second observation from an input that only had one observation.  That is how normal data step stop.  

 

Programming hint:   Note you could also use a FILEREF in the generated code.  In that case the logic to generate the code would not need to change for a different input file. Instead just make sure to define the fileref before running the generated code.

 

Programming style hint: Place comments about the code BEFORE the code, not after. That will make it easier for humans to read.   And your lines of code will not get so long that a human would have to shift their field of focus left and right to read it.

 

...
* Write data statement and INFILE statement ;
  set metadata (obs=1);
  put 'data ' memname '(label=' memlabel :$quote. ');' 
      '  infile csv dsd truncover firstobs=2;'  
  ;
...
/*Replace the address of the CSV with a real one */
filename csv 'myfile.csv';
%include code / source2;

 

About the format you picked:  MEMNAME values can only be 32 bytes long, just like variable NAME values.  Only if you have the system option VALIDMEMNAME set to EXTEND so that non-standard names are possible would there be a need to write more than 32 bytes.  But in that case you might want to adapt the code to handle such names. Unfortunately SAS does not supply a $NLITERAL. format you could use.  You could modify the lengths of the MEMNAME and NAME variables and store the name literals in the metadata.  Or modify the code generation step to convert the values using the NLITERAL() function before writing them.

 

Season
Barite | Level 11

Thank you for your reply!


@Tom wrote:

Adding the format to the PUT statement will allow the data step to make a better guess about how to define MEMNAME.  But that will not solve the issue that MEMNAME has not yet gotten a value at that point in the data step.

 

The problem is that the PUT statement that uses MEMNAME is BEFORE the SET statement.  So the data step compiler guesses that you wanted MEMNAME as numeric, and then when the first SET statement runs there is a conflict.

 

For that particular data step the solution would be to make sure the SET statement is before the PUT statement.  Not only will that fix the type conflict it will also mean that MEMNAME actually has a value that can be written by the PUT statement.

 

You could add another SET statement at the top of the step before the first PUT.  But then the data step would iterate once for each observation in the dataset, writing multiple copies of the data step to the output file.   To fix that you could add a STOP statement to the end of the data step so that it stops after the first iteration.  Or you could add an OBS=1 dataset option to the SET statement. In that case the data step would stop at the top of the second iteration when it tries to read a second observation from an input that only had one observation.  That is how normal data step stop.  


Yes, one of the key measures I took to tackle the problem was adding a SET statement after the "data _null_" statement. This informs SAS of where the variable MEMNAME comes from as well as its characteristics like its length and type. I forgot to say that in yesterday's response but that modification is present in the code. In addition, in an attempt to simplify the code conducted today, I found that the format for MEMNAME can actually be removed once the SET statement is added.


@Tom wrote:

Programming hint:   Note you could also use a FILEREF in the generated code.  In that case the logic to generate the code would not need to change for a different input file. Instead just make sure to define the fileref before running the generated code.

 

...
* Write data statement and INFILE statement ;
  set metadata (obs=1);
  put 'data ' memname '(label=' memlabel :$quote. ');' 
      '  infile csv dsd truncover firstobs=2;'  
  ;
...
/*Replace the address of the CSV with a real one */
filename csv 'myfile.csv';
%include code / source2;


Thank you for pointing out a way to further simplify the code! Adding yet another FILENAME statement renders a more handy code that can be more easily compiled into a macro. This is very useful.

Patrick
Opal | Level 21

@Tom Wouldn't the expression need to use Not Equal to drop the first line of a new file?

if fname NE lag(fname) then delete;

 

 

Tom
Super User Tom
Super User

@Patrick wrote:

@Tom Wouldn't the expression need to use Not Equal to drop the first line of a new file?

if fname NE lag(fname) then delete;

 

 


Yes. Perhaps that was the source of the confusion?

Season
Barite | Level 11
Not really... The confusion actually stemed from my lack of understanding on the roles certain statements or operators play in the code, especially the @ operator and the INPUT statement with no variable name specified.
Tom
Super User Tom
Super User

Do you know the structure of the files?  If so just write the data step to read it.

Are the files all using the same structure? If so you can write a data step that reads all of the file at once.

 

If you need a better tool for GUESSING how to read the file try this macro instead of PROC IMPORT.

%CSV2DS() 

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 54 replies
  • 2174 views
  • 36 likes
  • 7 in conversation