BookmarkSubscribeRSS Feed
0 Likes

Purpose: to allow code in "text" nodes to be searched by OS search utilities which is now hidden in EGP files due to the compression of this information.

 

Change: Add options to the program's options and the file's options to instruct EG so that when a project is saved, EG saves certain nodes in an un-compressed XML file in the same destination folder as the usual EGP file that is created at save.  By text node, we mean either program nodes, "last code submitted" nodes, log nodes, and/or note nodes, etc. The program options would control default behavior of newly created files, where the file's options would override the default-behavior options for the specific file. I would not see this to be used in conjuction with EGP files saved with password encryption.

 

Motivation:  My working group has always used SAS Interactive and Batch and is accustomed to using the OS to perform text searches of their collections of SAS program files.  The decision to design EGP files as compressed files with a internal file structure makes these files non searchable in usual, conventional ways.  An idea like this can allow an OS to search EGP files as SAS files, if the user chooses.

 

Other ideals to implement in conjunction with this idea:

  1. store this XML file in the EGP file but not as a non-compressed file in the zip file structure.
  2. Program EG so that when it is called to open an "XML" like this, one that it created, then EG would instead open the accompanying sister EGP file, if it exists -- if it has same modified date.
  3. The XML schema could be simply implemented.  One XML tag for each node, with at least the process flow and node names stored. Or a more complicated schema -but- not too complicated.
  4. Or the file is just a text file, not involving XML.  But also stored in a away that would prevent these files from being inadvertantly ran like conventional SAS code files.
  5. Allow user to choose which type of text nodes are to be included in the text output XML file: e.g. program nodes, "last code submitted" nodes, log nodes, and/or note nodes, etc.
12 Comments
paulkaefer
Lapis Lazuli | Level 10

It's an interesting idea. I'm curious why not just store code as text (i.e., .sas files), which is easier to version control than zipped files (like EGP files), which are binary and by nature not suited for versioning? I see you mention searching logs/output as well, but these can be stored in text files.

PhilC
Rhodochrosite | Level 12

This idea is that all of this would be automated -- signaled to be done everytime a project file is saved. (IF the project file settings or the program settings are set to do so.)

PhilC
Rhodochrosite | Level 12

ALSO - the point would not be  that this text file/XML file would be runable code, but it would be something that could be searched through for the presence of format names; library, column and data set names; comment text, etc. 

PhilC
Rhodochrosite | Level 12

I need to research this, but I don't know yet if LOG output can be teed to be both saved in a project file and to an external text file.   But just thought this would be a use of this idea that some EG users might use.  (I say teed - as in the TEE command used in certain UNIX command shells  ) 

paulkaefer
Lapis Lazuli | Level 10

This page discusses options for the SAS log. I think ALTLOG is what you are looking for, as it "specifies the destination for a copy of the SAS log."

PhilC
Rhodochrosite | Level 12

ALTLOG looks functional using SAS by batch or SAS Interactive , but while using Enterprise Guide?  Each node does not have an individual file name and path so I don't know how that would work, still I can look into it.  The following is a discussion simmilar to this concept but involving only logs:

 

 

ALTLOG is not discussed here, the EG API automation Chris Hemedinger is discussed.  To save Chris some time...

 

I guess if If I had time to learn this and the influence to sell it to my co-workers, I would.  I'd rather spend my time asking nicely for this.

PhilC
Rhodochrosite | Level 12
  • It's an interesting idea. I'm curious why not just store code as text (i.e., .sas files), which is easier to version control than zipped files (like EGP files), which are binary and by nature not suited for versioning? I see you mention searching logs/output as well, but these can be stored in text files.

Version control, great idea.  I simply want a file that I can include in a regular old file search like normal SAS code files. 

 

EGP is a different technology, and its great, but the price to adopt this new technology is that the buyer must also adopt a whole different set of technologies to retain the functionality they once had with the older alternative, or forsake theirselves this same functionality.  The price is too high especially to the peope using SAS that I know. 

paulkaefer
Lapis Lazuli | Level 10

So maybe I'm just now fully understanding your request... basically the idea is to have something like .egp, but rather that it points to code files, rather than compressing them all into one? So you can open this probably-xml file in SAS (i.e., not runnable), and it would be able to show the process flow and have other benefits of .egp files (ordered lists, etc.), but the code and log files would remain text files stored wherever you like, so you can easily search using Linux or Windows file search tools?

 

This sounds entirely reasonable; XML is text, and can be versioned along with all the files. And you can store all the files in the same folder, or even point to files elsewhere. And it can be an option, as you mention; some users can stick with .egp, but the benefits of this new format (egp2? egpx?) would be attractive to users such as yourself (and acceptable under best practices of software engineering, including version control).

 

It sounds like this should not be difficult to implement given what's currently done. Another thing I would add would be to configure log storage locations "globally". In other words, configure in this new project type where the log/output of all code within it is stored. And perhaps allow relative paths, for projects with multiple anticipated users.

PhilC
Rhodochrosite | Level 12
  • So maybe I'm just now fully understanding your request... basically the idea is to have something like .egp, but rather that it points to code files, rather than compressing them all into one?

It would be an extra file saved with the EGP file, its purpose would be to allow text searching.  any text found in this text based file would be found in the sister(or rather parent) EGP file

 

  • So you can open this probably-xml file in SAS (i.e., not runnable),

open it in EG, it would open the parent EGP file.  If it was modified at the same time --  -- the text in this file would be what was generated from the EGP at file save time.

 

  • and it would be able to show the process flow

and name of the node the text was sourced from

 

  • and have other benefits of .egp files (ordered lists, etc.),

and NOT have other benefits of EGP files ( more about ordered lists and version control later...)

 

  • but the code and log files would remain text files stored wherever you like, so you can easily search using Linux or Windows file search tools?

Yes, if the user chooses.

 

  • This sounds entirely reasonable; XML is text, and can be versioned along with all the files. And you can store all the files in the same folder, or even point to files elsewhere. And it can be an option, as you mention; some users can stick with .egp, but the benefits of this new format (egp2? egpx?)

egptxt?

 

  • would be attractive to users such as yourself (and acceptable under best practices of software engineering, including version control).

if there was a SAS program that could store its code on a file system the way the would make sense using a third party version control system(e.g. GitHub) this idea wouldn't be needed to be used.  All the searchable code files would be present on the file system and searchable.

 

  • It sounds like this should not be difficult to implement given what's currently done.

I think so,

 

  • Another thing I would add would be to configure log storage locations "globally". In other words, configure in this new project type where the log/output of all code within it is stored. And perhaps allow relative paths, for projects with multiple anticipated users.

See you would need this for version controling.  I don't think my idea is the best ideal to help EG work with version control systems.  There needs to be more.  But-- My bosses aren't going to consider using version control systems at the moment, so I would need to read up the SAS papers about this on my own time.

paulkaefer
Lapis Lazuli | Level 10

> My bosses aren't going to consider using version control systems at the moment

 

I can't recommend version control enough. This article discusses it and links to papers from other SAS users. If it's not implemented at a higher level in the organization, you should be able to still use TortoiseGit and/or TortoiseSVN for individual projects.

 

SAS EG does have built-in version control support. See this paper, for example. What's interesting, and I haven't seen this yet, is that it allows for externally referenced programs and relative paths.

 

A couple benefits of individual files over one text file: (1) searching and returning line number, or count per file; (2) you can modify outside of SAS. The index idea you suggest is neat, but prevents you from modifying code outside of SAS.