Zeke Torres CEO Code629
With the advent of more team diversity in skills syntax, the challenges have only increased in how to integrate new team members, code styles, code upgrades and syntax. Even if you are a single person creating and editing your own code, it's time to Git started. This is the how to start if you don't know how to Git started, especially if you are working with a team. The need exists for teams to have a practical outline of elements incorporated to achieve a strategic harmony of SAS, Python, R (and more) with Git or some version control. Improve code transparency version control with useful team code review methods. On top of all that, we must manage the data we use with our code. Its important to document a projects key decision points during an analytics cycle. Doing so increases the accuracy of analytics results by reducing issues due to missing key code documentation. There are key differences in the way analytics teams work versus how data engineering and ETL data preparation coders work. Its not just a syntax difference; its also a code cultural difference. This presentation will outline how these cultures can co-exist and identify areas where they can collaborate. It will discuss the strengths they each contribute, while still taking advantage of the common elements of code governance, documentation and code version control.
Watch Git Strategic Management of Code and Data -- SAS®, Python, R and More -- inSPyred! as presented by the author on the SAS Users YouTube channel.
The ability to track our changes and simplify our work is probably the most important.
Even if the others on our team have not used Git or are not using Git - it's my opinion that its necessary to use just for our own solo work.
Our "code" changes and often we can't easily reconcile the versions we end up with.
As with this example above - we start with an "original" code and soon there is a new copy for one type of variation of work.
Soon followed by another copy.
Let's cover what you need to be familiar with. We need to learn about where we store our code.
These are common locations like GitHub or BitBucket. Then there is the "interface" to how we manage our code changes and code change documentation. We might use software like: Github Desktop, SourceTree.
However, often users confuse what these resources or locations represent.
Lets review that!
Let's review where our code will be kept. Either for a solo single user; team repository which can be private or public; or all the way thru to a fully public repository for open collaboration. Our "remote" repository stores our code and allows collaboration to take place. From that remote repository we source our safe, secure code and project resources.
The main idea: GITHUB and BITBUCKET are simply a Dropbox or OneDrive for our code.
These are secure services on the cloud that help us manage the changes to our code and in a secure way - share our code with others.
This software is used on our local machine to commit our changes, document it and communicate with our remote repository. This software is free.
We can also use the Git command line. This topic won't go into the command line method of doing Git commits. That's because its very nicely covered by the software this paper covers.
Link: GithubDesktop
Link: SourceTree
I want to point out why this paper does not cover "SAS" and Git.
The papers by other presenters have done a superb job of covering this. A paper I highly recommend you look at next is by Joe Matise! SASGF Paper 1021-2021
My intention is to ensure a wider more common view of tools that help with Git changes and work. In this way demonstrating the similar concepts of Git as a tool and resource regardless of the Git management tool we use.
There is no doubt that the SAS family of products have been superior in adopting and integrating Git from years back. There must be credit given to SAS for recognizing the power of Git early and integrating it into its products.
I felt it was important to give an overview of common tools and basic commands. As well as review a few scenarios in which we often face as we first encounter Git. Example - a python developer who we collaborate with. They might comment they use GitHUB Desktop or even some other software to manage Git work. This paper is meant to focus on Git not just a SAS centric use of Git.
To me the challenge with helping someone get familiar with change control is the person learning having the impression that "so now I'll need to fix all my code - and then I'll start to use Git". The user/learner feels that pressure to delay the Git process for some ideal scenario where their work is now "perfect" and then they can start.
This to me is not the ideal case.
I can't stress enough how important and critical it is to just "Git" started.
Git and a simple initiation of a repository is the best way to just get familiar with the tool and its benefits. It’s safe. So this paper is crafted to show you "how" you could take an existing folder, project, code and simply Git started.
In this clip - let’s review a basic folder we might have and how it has lots of code, files, documents. It’s a short example of what anyone might find or encounter when they 'start' work.
Example Practice Folders (Zipped)
Before we really initiate the Git repository on that "folder" and start to engage with Git, lets review some elements to now be aware of.
These are special folders where Git does its magic.
Git has some parts where "we" the users - simply must not explore, change or touch.
The folder like (.GIT) - is the crux and heart of Git that must remain "as-is" and we really should not tamper with it. Lets review how that folder looks on our desktops and what we now always want to be aware of to ensure our local copy of the Git repositories we will have are nice and safe.
We can Git started now that we’ve covered the basics and have a folder to initiate with a Git repository; needed GithubDesktop software and Github account.
Lets learn about what our branches mean to our code and our changes to our code.
Branches are our new best friends.
The management of branches and how we work now become useful elements of tracking our changes and code development.
Branches are to me a way of switching to versions of our code at any given time as we (or someone else) develops that code. The branches anyone makes (us, others) are meant to be protected by a fundamental set of rules like...
This realm of Git and version control will prompt a flurry of new rules that you and your team will need to ponder for themselves. I'll attempt to highlight what those are to give you a starting point. Ultimately, you'll need to adopt the main rules and then soon you'll need to craft additional rules to accommodate your teamwork or even your own solo work.
In the 80's or 90's we often had software and computers that were prone to "locking up" as we worked on some document. So often those lock ups would result in work people did simply getting lost. The term came up "Save Early Save Often". It became a popular slogan and then someone created "Autosave". This slogan also became less important as machines and software became much more reliable.
Today my way of helping newbies to Git understand "commits" is - that its the new "Save Early Save Often". A "commit" is to us - our way of securing our work, changes but also (crucially) the context of the change. The documentation about the change.
So this brings up the idea of "comments".
I'm often asked "Does this mean my code won’t have comments?". No. Not at all.
Our code will have comments about our code. As that always should be the case.
However, our Git commit will have our comments about the "WHY and WHAT" was going on for the need of the changes to the code. Let's review that COMMIT process.
As we have just seen - our code is committed, and our changes are now documented. But those are just sitting on our local image of our repository. They are not yet shared or communicated to the repository we have on GitHUB. Our reason to Push/Pull is to sync to that remote repository and ensure that the remote is current with our work or any work that anyone else is sharing with us.
The Pull Request is our new friend. This action allows either us (if we work solo) or others (in a team scenario) to check work before we make it part of a "merge" to another branch.
It is important that the Pull Request become part of that workflow and allows anyone in the Git repository to evaluate what is being done to branches that others (or we) may impact by this action.
If the commit is our "Save Early Save Often" - our Pull Request is the action to mesh and merge the changes we've made in a permanent way to the desired branch.
I think this is one of the coolest things about Git that you will soon see for yourself.
To me, a Git repository with useful commits and pull requests - means we literally have a time machine at our disposal.
If we have done even a semi decent job of performing commits and pull requests - it’s possible to switch to past commits, other branches - without jeopardizing our current work or the work of others. More importantly - we can evaluate the work of others via branches or simply review what they are doing or changing.
That to me is one powerful aspect of Git.
Another is simply that by having the repository we can see the work and changes and what was being done. We do not have to panic in the same way if we did not have our project under management of Git.
The goal of this paper was to demonstrate how to Git started with code we might have. Often, that is part of a folder or project where more than one syntax is present. The recommendations that follow are from my point of view and from the idea that our projects are now more diverse. I seldom see a project where its simply ONE syntax end to end.
In many cases where a consistent tool like Git is not present - the teams have crafted some very strange or complex folder structures to enable different syntax to work better together.
This complexity is often not an issue until - someone needs to work on other parts that cross over to a different syntax. Or simply that the layout of the folders are simply complex beyond an ideal state if we did have Git as part of our team work.
So - let’s review some ways we can ideally segment our work over time, incrementally from where we started (in our example) to something that can allow more than one syntax to exist together.
Lets put all the parts together and review.
Example 1 - Our Repository Exists on our local laptop and also in Github. Watch Media!
Let's start to gather those rules here that you need to adopt in some form or fashion.
Here are some links I recommend you try for your learning resources!
I hope this introduction to Git allows you to Git inspired to give it a try. You’ll likely need to go thru one cycle of initiating a repository and then merging a branch to really see a potential for the power it will have in helping you code.
Matise, Joe. 2021/05. “Git and SAS®: A Match Made in (SAS®) Studio” Available at
I’d like to thank the following people for helping me get this topic together or simply tolerating my demeanor and being patient with me as I worked on this.
I especially would like to thank these kind people for also contributing to reducing how sophomoric I might have done this without their help.
Thanks to the following teams that recently have participated in this "Introduction to GIT" and helped me craft, rehearse and prepare this topic:
Special thanks to the SASGF 2021 team!
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.