BookmarkSubscribeRSS Feed

Git Strategic Management of Code and Data -- SAS®, Python, R and More -- inSPyRed!

Started ‎03-15-2021 by
Modified ‎10-12-2021 by
Views 3,108
Paper 1116-2021
Authors
 
 

Zeke Torres CEO Code629 

 

 

Abstract

With the advent of more team diversity in skills syntax, the challenges have only increased in how to integrate new team members, code styles, code upgrades and syntax. Even if you are a single person creating and editing your own code, it's time to Git started. This is the how to start if you don't know how to Git started, especially if you are working with a team. The need exists for teams to have a practical outline of elements incorporated to achieve a strategic harmony of SAS, Python, R (and more) with Git or some version control. Improve code transparency version control with useful team code review methods. On top of all that, we must manage the data we use with our code. Its important to document a projects key decision points during an analytics cycle. Doing so increases the accuracy of analytics results by reducing issues due to missing key code documentation. There are key differences in the way analytics teams work versus how data engineering and ETL data preparation coders work. Its not just a syntax difference; its also a code cultural difference. This presentation will outline how these cultures can co-exist and identify areas where they can collaborate. It will discuss the strengths they each contribute, while still taking advantage of the common elements of code governance, documentation and code version control.

 

Watch the presentation

Watch Git Strategic Management of Code and Data -- SAS®, Python, R and More -- inSPyred! as presented by the author on the SAS Users YouTube channel.

 

 

WHY USE GIT?

The ability to track our changes and simplify our work is probably the most important.

Even if the others on our team have not used Git or are not using Git - it's my opinion that its necessary to use just for our own solo work.

Our "code" changes and often we can't easily reconcile the versions we end up with.

 

zekeT_sasaholic_0-1619020967733.png

 

As with this example above - we start with an "original" code  and soon there is a new copy for one type of variation of work.

Soon followed by another copy.

 

That CAN SOON SPRIAL

 

 

zekeT_sasaholic_1-1619021079909.png

 

 

Watch More!

 

 

The Ingredients

Let's cover what you need to be familiar with.  We need to learn about where we store our code. 

These are common locations like GitHub or BitBucket.  Then there is the "interface" to how we manage our code changes and code change documentation.  We might use software like: Github Desktop, SourceTree.

 

However, often users confuse what these resources or locations represent.

Lets review that!

 

Watch More!

 

What is GITHUB, BitButcket?

Let's review where our code will be kept.  Either for a solo single user; team repository which can be private or public; or all the way thru to a fully public repository for open collaboration.  Our "remote" repository stores our code and allows collaboration to take place.  From that remote repository we source our safe, secure code and project resources.

 

The main idea:  GITHUB and BITBUCKET are simply a Dropbox or OneDrive for our code.

These are secure services on the cloud that help us manage the changes to our code and in a secure way - share our code with others.

 

Watch More!

 

What is GITHUB DESKTOP, SOURCETREE?

This software is used on our local machine to commit our changes, document it and communicate with our remote repository.  This software is free.

We can also use the Git command line.  This topic won't go into the command line method of doing Git commits. That's because its very nicely covered by the software this paper covers.

Link: GithubDesktop

Link: SourceTree

 

Watch More!

 

what about sas and git?

I want to point out why this paper does not cover "SAS" and Git.

The papers by other presenters have done a superb job of covering this.  A paper I highly recommend you look at next is by Joe Matise! SASGF Paper 1021-2021

 

My intention is to ensure a wider more common view of tools that help with Git changes and work.  In this way demonstrating the similar concepts of Git as a tool and resource regardless of the Git management tool we use.

There is no doubt that the SAS family of products have been superior in adopting and integrating Git from years back.  There must be credit given to SAS for recognizing the power of Git early and integrating it into its products.

I felt it was important to give an overview of common tools and basic commands.  As well as review a few scenarios in which we often face as we first encounter Git.  Example - a python developer who we collaborate with.  They might comment they use GitHUB Desktop or even some other software to manage Git work.  This paper is meant to focus on Git not just a SAS centric use of Git.

 

 

where to start with youR existing code - as-is

To me the challenge with helping someone get familiar with change control is the person learning having the impression that "so now I'll need to fix all my code - and then I'll start to use Git".  The user/learner feels that pressure to delay the Git process for some ideal scenario where their work is now "perfect" and then they can start.

This to me is not the ideal case.

I can't stress enough how important and critical it is to just "Git" started.

Git and a simple initiation of a repository is the best way to just get familiar with the tool and its benefits. It’s safe.  So this paper is crafted to show you "how" you could take an existing folder, project, code and simply Git started.

In this clip - let’s review a basic folder we might have and how it has lots of code, files, documents.  It’s a short example of what anyone might find or encounter when they 'start' work.

 

Example Practice Folders (Zipped)

 

Watch More!

 

Key basics to be aware of

Before we really initiate the Git repository on that "folder" and start to engage with Git, lets review some elements to now be aware of.

These are special folders where Git does its magic.

Git has some parts where "we" the users - simply must not explore, change or touch.

The folder like (.GIT) - is the crux and heart of Git that must remain "as-is" and we really should not tamper with it.  Lets review how that folder looks on our desktops and what we now always want to be aware of to ensure our local copy of the Git repositories we will have are nice and safe.

Watch More!

 

Lets initiate our first repository

We can Git started now that we’ve covered the basics and have a folder to initiate with a Git repository; needed GithubDesktop software and Github account.

 

Watch More!

 

what are branches?

Lets learn about what our branches mean to our code and our changes to our code.

Branches are our new best friends.

The management of branches and how we work now become useful elements of tracking our changes and code development.

Branches are to me a way of switching to versions of our code at any given time as we (or someone else) develops that code.  The branches anyone makes (us, others) are meant to be protected by a fundamental set of rules like...

  • Do not change the code in a branch someone else has
  • Do not merge a branch to another branch that someone else did not approve or request

This realm of Git and version control will prompt a flurry of new rules that you and your team will need to ponder for themselves.  I'll attempt to highlight what those are to give you a starting point. Ultimately, you'll need to adopt the main rules and then soon you'll need to craft additional rules to accommodate your teamwork or even your own solo work.

 

Watch More!

 

what is a commit?

In the 80's or 90's we often had software and computers that were prone to "locking up" as we worked on some document. So often those lock ups would result in work people did simply getting lost.  The term came up "Save Early Save Often".  It became a popular slogan and then someone created "Autosave". This slogan also became less important as machines and software became much more reliable.

Today my way of helping newbies to Git understand "commits" is - that its the new "Save Early Save Often".  A "commit" is to us - our way of securing our work, changes but also (crucially) the context of the change. The documentation about the change.

So this brings up the idea of "comments".

I'm often asked "Does this mean my code won’t have comments?".  No. Not at all.

Our code will have comments about our code. As that always should be the case.

However, our Git commit will have our comments about the "WHY and WHAT" was going on for the need of the changes to the code.  Let's review that COMMIT process.

 

Watch More!

 

PUSHING pulling 

As we have just seen - our code is committed, and our changes are now documented. But those are just sitting on our local image of our repository.  They are not yet shared or communicated to the repository we have on GitHUB.  Our reason to Push/Pull is to sync to that remote repository and ensure that the remote is current with our work or any work that anyone else is sharing with us.

 

Watch More!

 

pull request

The Pull Request is our new friend.  This action allows either us (if we work solo) or others (in a team scenario) to check work before we make it part of a "merge" to another branch.

It is important that the Pull Request become part of that workflow and allows anyone in the Git repository to evaluate what is being done to branches that others (or we) may impact by this action.

If the commit is our "Save Early Save Often" - our Pull Request is the action to mesh and merge the changes we've made in a permanent way to the desired branch.

 

Watch More!

 

Wait i think i made a mistake!

I think this is one of the coolest things about Git that you will soon see for yourself.

To me, a Git repository with useful commits and pull requests - means we literally have a time machine at our disposal.

If we have done even a semi decent job of performing commits and pull requests - it’s possible to switch to past commits, other branches - without jeopardizing our current work or the work of others. More importantly - we can evaluate the work of others via branches or simply review what they are doing or changing.

That to me is one powerful aspect of Git.

Another is simply that by having the repository we can see the work and changes and what was being done.  We do not have to panic in the same way if we did not have our project under management of Git.

 

Watch More!

 

other syntax considerations

The goal of this paper was to demonstrate how to Git started with code we might have. Often, that is part of a folder or project where more than one syntax is present.  The recommendations that follow are from my point of view and from the idea that our projects are now more diverse.  I seldom see a project where its simply ONE syntax end to end.

In many cases where a consistent tool like Git is not present - the teams have crafted some very strange or complex folder structures to enable different syntax to work better together.

This complexity is often not an issue until - someone needs to work on other parts that cross over to a different syntax.  Or simply that the layout of the folders are simply complex beyond an ideal state if we did have Git as part of our team work.

So - let’s review some ways we can ideally segment our work over time, incrementally from where we started (in our example) to something that can allow more than one syntax to exist together.

 

Watch More!

 

A full cycle of work

Lets put all the parts together and review.

  • A repository being initiated for the first time has some similarities to work on an existing repository. So lets double check those steps.  We will need to become familiar with initiating a repository so review the media on that. 
  • Once our repository is initiated - or as we encounter repositories that others created for us. Lets review what our work flow will look like and make sure we are familiar with the key steps to be proficient in.

Media Examples

Example 1 - Our Repository Exists on our local laptop and also in Github.  Watch Media!

 

 

rules to adopt

Let's start to gather those rules here that you need to adopt in some form or fashion.

  1. Do not put data into a Git repository.
  2. Keep all repositories private - until you are sure you can adjust those settings.
  3. Pay attention to 'where' your Git repository sits in the project folder you have and others have.
  4. Always "Fetch" and refresh your master, main branches from your origin (Remote Repo) before you do your work.
  5. Always "Create A New Branch" for your work - from the branch you need to work on. Again - NEVER work on the 'source' branch directly.
  6. Never work on the Remote Git Repository.
  7. Never work on the Master/Main branch. There is one exception to this... New Repo.
  8. Only in a New Repo - a new initiation of a Git repo - can we add things to Master/Main.
  9. Adopt a naming convention for your branches asap and adhere to that.

 

Additional resources

Here are some links I recommend you try for your learning resources!

 

  • What is Git? Explained in 2 minutes!: Link
  • Git and GitHub for Poets 13 minutes!: Link
  • What is Github: in 3.5 minutes!: Link
  • Git Tutorial Part 1: What is Version Control? 9.5 minutes!: Link

 

CONCLUSION

I hope this introduction to Git allows you to Git inspired to give it a try.  You’ll likely need to go thru one cycle of initiating a repository and then merging a branch to really see a potential for the power it will have in helping you code.

 

REFERENCES

 

Matise, Joe.   2021/05. “Git and SAS®: A Match Made in (SAS®) Studio” Available at

https://communities.sas.com/t5/SAS-Global-Forum-Proceedings/Git-and-SAS-A-Match-Made-in-SAS-Studio/t...

 

Acknowledgements

I’d like to thank the following people for helping me get this topic together or simply tolerating my demeanor and being patient with me as I worked on this.

  • My daughter and son.
  • My sister and brother-in-law.
  • My SAS peers! Especially those on SASENSEI!

I especially would like to thank these kind people for also contributing to reducing how sophomoric I might have done this without their help.

  • Kay Easton – Media and Graphics, Research (Media)
  • Elizabeth Lopez – Research Confirmation
  • Kay Whitman – Review/Edits
  • Kushal Pokhrel – Review/Edits
  • Kiran Venna – Review/Edits
  • Tony Mayo – Review/Edits

Thanks to the following teams that recently have participated in this "Introduction to GIT" and helped me craft, rehearse and prepare this topic:

  • NORC
  • Vanderbilt University
  • Johns Hopkins University

Special thanks to the SASGF 2021 team!

 

Version history
Last update:
‎10-12-2021 03:52 PM
Updated by:
Contributors

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Article Tags