BookmarkSubscribeRSS Feed
andreas_lds
Jade | Level 19

@Ravindra_ wrote:

@Tom and @ballardw  thanks for your concern, i am having the same concern about our company having such policies and not sure how to handle this. I will suggest them to use a metadata having defined structure where we can look into and generate output accordingly.

 

In this regard, i am having a small query, is it possible to automatically pick the metadata information having variable attribute information and map that with csv data and import that to SAS dataset. If it is possible how do we do that.

 

Thanks again.


Yes, it is possible to use metadata to create data steps reading csv files. But compared to writing the data step directly, this is not trivial.

ballardw
Super User

@Ravindra_ wrote:

@Tom and @ballardw  thanks for your concern, i am having the same concern about our company having such policies and not sure how to handle this. I will suggest them to use a metadata having defined structure where we can look into and generate output accordingly.

 

In this regard, i am having a small query, is it possible to automatically pick the metadata information having variable attribute information and map that with csv data and import that to SAS dataset. If it is possible how do we do that.

 

Thanks again.


Considering the number of different ways I have received "metadata" descriptions about source data files I would answer the "is it possible" with a "it probably depends".

 

I am very leery of blind use of any such tool. If there is something that gets close and then you have a knowledgeable person review the generated data that might be the way to go.

 

One simple example of issues around automated generation could be variable names length and characters. SAS currently uses 32 characters and must letter, digit or underscore. I really don't like the options that allow use of the extended characters where you have to constantly refer to variables as "Some #garbage"n . And the heuristics for very long source variables that are identical for the first 32+ characters may result in some oddness indeed.

Tom
Super User Tom
Super User

It can be done.  How depends on what your specific issues (risks) are with the data you are receiving.

 

If the issue is that the column names are always valid, but the order and/or inclusion/exclusion of columns from a particular file can vary then you can read the header row only and use that to drive the creation of the program to read based on the expected format for that column.  It can even be as simple as something like: (1) Get variable names from header. (2) Generate a data step that defines the structure by reading in just those variables from a template dataset.  So something like:

data _null_;
  infile "myfile.txt" obs=1;
  input;
  call  symputx('varlist',translate(_infile_,' ',','));
run;
data want;
  if 0 then set mylib.template(keep=&varlist);
  infile "myfile.txt" dsd truncover firstobs=2;
  input &varlist;
run;

You could also read all of the files into a tall skinny file and then do your own logic to figure out what variable is what and what type of value it contains.  So something like:

data 
  files (keep=fileno filename)
  values(keep=fileno rowno colno value)
;
   length fileno rowno colno 8 value $200 fname filename $256;
   retain fileno 0 rowno 0;
   infile "c:\downloads\h*.csv" dsd truncover length=ll column=col filename=fname;
   input @;
   if fname ne lag(fname) then do;
       fileno+1; rowno=0; filename=fname; output files;
  end;
  rowno+1;
  do colno=1 by 1 until(col>ll);
      input value @;
      if value ne ' ' then output values;
  end;
run;
NOTE: The infile "c:\downloads\h*.csv" is:
      Filename=c:\downloads\Honda.csv,
      File List=c:\downloads\h*.csv,RECFM=V,
      LRECL=32767

NOTE: The infile "c:\downloads\h*.csv" is:
      Filename=c:\downloads\Hummer.csv,
      File List=c:\downloads\h*.csv,RECFM=V,
      LRECL=32767

NOTE: The infile "c:\downloads\h*.csv" is:
      Filename=c:\downloads\Hyundai.csv,
      File List=c:\downloads\h*.csv,RECFM=V,
      LRECL=32767

NOTE: 17 records were read from the infile "c:\downloads\h*.csv".
      The minimum record length was 75.
      The maximum record length was 110.
NOTE: 1 record was read from the infile "c:\downloads\h*.csv".
      The minimum record length was 68.
      The maximum record length was 68.
NOTE: 12 records were read from the infile "c:\downloads\h*.csv".
      The minimum record length was 83.
      The maximum record length was 92.
NOTE: The data set WORK.FILES has 3 observations and 2 variables.
NOTE: The data set WORK.VALUES has 450 observations and 4 variables.

 

Tom
Super User Tom
Super User

... we are not allowed to write attributes within data step ...

You should check on this. Either someone is setting rules that knows nothing about how to write SAS code. Or there is some nuance in this policy that did not get communicated properly.

 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 18 replies
  • 3221 views
  • 2 likes
  • 6 in conversation