BookmarkSubscribeRSS Feed
Justin9
Obsidian | Level 7

Hi,

 

Please can someone help to provide suggestions/code for me create a reference table that can be updated every month, based on the values that are in a variable?

 

I currently have the code below, where I am trying to identify any new values for the 'Colour' variable (characteristic variable) that have not appeared my dataset (which gets updated every month to include the latest month) before. The reference table will be updated so that it will contain all of the values that have appeared before, but the code should allow me to add on any new values to the list.

 

As April and May is the first time that I have checked, the reference table that will be created should store all of the 'Colour' values from April and then if there are any new 'Colour' values in May, this will be added onto the reference table. As a result, I can substitute the dataset in my proc sql code below and compare my 'Test' dataset to the reference table that will be created.

 

For example, if accounts in April have 'Colour' values of Red, Blue, Green, Yellow and Orange, then the initial reference table created will contain all of these. If May in my 'Test' dataset has a new value of Purple, the reference table will add on Purple, so it will now have Red, Blue, Green, Yellow, Orange and Purple, which I can compare against and update every month in the future.

 

Code:

proc sql;
        select distinct Colour
                from Test
                where "01May2021"d<=MONTH<="31May2021"d
        except
        select distinct Colour
                from Test
                where "01Apr2021"d<=MONTH<="30Apr2021"d
       ;
quit;

/*Proc freq of more recent month*/
proc freq data=Test;
         tables Colour;
         where "01May2021"d<=MONTH<="31May2021"d;
run;

/*Proc freq of older month*/
proc freq data=Test;
         tables Colour;
         where "01Apr2021"d<=MONTH<="30Apr2021"d;
run;

 

 

 

 

3 REPLIES 3
Sajid01
Meteorite | Level 14

You can make use of the SAS UPDATE statement. 
In this approach your Reference dataset is considered the Master and the monthly dataset is the transaction dataset.
Please read the SAS documentation page at https://documentation.sas.com/doc/en/pgmsascdc/9.4_3.5/lestmtsref/p18w3br45er2qun1r8sfmm4grjyr.htm 
There are number of SAS papers on the subject.

Justin9
Obsidian | Level 7

Thanks for your reply. As I am starting from scratch, please can you tell me how I could create my master dataset (i.e. April from my example), where I would save down the 'Colour' values that appear (e.g. if there were six accounts and the colours for the accounts were Red, Green, Red, Orange, Green, Blue, then my master dataset should list out Red, Green, Orange and Blue) i.e. these values are what appears in my proc freq output in the 'Results' tab? My 'Test' dataset contains over 100,000 records, so it would be difficult to use datalines and manually type all of the values that appear.

 

From the link. the update statement appears to merge together all of the variables in a dataset. Is there a way to only select the 'Colour' variable when merging on the values in May (transaction dataset) onto the initial April (master dataset)?

Sajid01
Meteorite | Level 14

I have reviewed your posts. I have highlighted your questions in blue and my suggestions below.
There can be other approaches to solve your issue, but this is how I would do if I have to.
(1) Create a reference table that can be updated every month, based on the values that are in a variable.
Considering Reference as master and the monthly as the trasaction data set, the update statement can be used.
For this approach to be used for the unique value of the by variable, there should be only one record/row for the by variable.
The by variable is like primary key in a database


(2) I am trying to identify any new values for the 'Color' variable (characteristic variable) that have not appeared my dataset (which gets updated every month to include the latest month) before.
Create a separate table having a single column listing distinct colors in your master database. I will call it tbColor. If your new color is not present in the tbColor  then you can add the new record /color data to the Master or reference database. Proc SQL can be used for search and Proc Append to add the new record to the Master.

Once a new record has been added to the Master/Reference table, the tbColor need to be update or created afresh.


(3)My 'Test' dataset contains over 100,000 records, so it would be difficult to use data lines and manually type all of the values that appear.
Now is your dataset based only on color?. (I know there can be millions of shades.)

Looks like your situation falls in case (2) above. If you want to create a master/reference data set from this test table, you can use either Proc SQL or data step.
It depends upon your comfort level.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 3 replies
  • 839 views
  • 0 likes
  • 2 in conversation