BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
CSpanias
Fluorite | Level 6

Hello all,

 

I am brand new to SAS programming and I am trying to learn everything on the go. I managed to complete the SAS Foundations 1 course which helped me greatly, but I am still troubled by some things I am seeing and I hoped that I could get some help here. 

 

You can find below a snippet of a long macro definition:

 

 

data work.&brand._intl_all(keep=walker_gl principal_amt  no_cc gbp no_bal no_glcode outrec) 
        work.&brand._intl_pop(drop= no_cc gbp no_bal no_glcode outrec); * create two SAS tables;

    if _n_ eq 1 then
        do;
            declare hash CC(dataset:"workspds.liq_walker_cc_&YYYY_MM.");
            CC.definekey('walker_costcentre');
            CC.definedone();
         end;

    set &libre..liq_&brand._intl_extract_&yyyy_mm. (rename=(derived_cc=WALKER_COSTCENTRE)) end=eof;

From my limited understanding, this DATA STEP:

 

  1. imports the &libre..liq_&brand._intl_extract_&yyyy_mm. table
  2. creates two new temporary tables based on that:
    1. work.&brand._intl_all(keep=walker_gl principal_amt no_cc gbp no_bal no_glcode outrec)
    2. work.&brand._intl_pop(drop= no_cc gbp no_bal no_glcode outrec);
  3. applies some modifications, i.e. keeping / dropping columns.

The thing is that only the first two columns, i.e. walker_gl and principal_amt, are located in the imported table. All others, i.e. no_cc, gbp, no_bal, no_glcode, and outrec, are global variables that were defined at the very top of the program.

So, I have two questions:

 

  1. Is work.&brand._intl_all(keep=walker_gl principal_amt no_cc gbp no_bal no_glcode outrec) keeping just those two columns from the imported table and adding the global variables as columns initialized all to 0? 
  2. What work.&brand._intl_pop(drop= no_cc gbp no_bal no_glcode outrec) is doing? Since all are global variables only existing on the temp table mentioned above (if that is the case), does that mean that is filtering the first temp table and therefore keeping only walker_gl and principal_amt columns?  

Thank you in advance for your time and help.

 

Regards,

 

Charalampos

 

P.S. I am confident that I am missing or misinterpreting something that more experience programmers will spot right away, that is why I posted just this part of the macro. If more info is needed, I can post the whole macro definition (although it is quite long).

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

The key part of the data step is here:

output &brand._intl_all;
if outrec = 1 then output work.&brand._intl_pop;

So it is unconditionally writing every observations to the "ALL" dataset.  But only writing observations to the "INTL_POP" dataset when the OUTREC variable is set to 1.  It is not making any changes to any variables in between writing those two observations so the values for the OUTREC=1 observations should match (other than that the KEEP= dataset options will determine which variables are included).

 if put(walker_gl,$gl_code.) ne 'INTL'

So this code is saying the WALKER_GL is a CHARACTER variable.  You use character formats (start with a $) with character values.  So essentially the PUT() function is converting one string to a different string based on the format defintion.

 

FORMATS convert values to text.  INFORMATS convert text to values.

Numeric formats convert numbers to strings.  Character formats convert strings to other strings.

Character informats convert strings to other strings.  Numeric informats convert strings to numbers.

 

 

View solution in original post

5 REPLIES 5
ballardw
Super User

Is work.&brand._intl_all(keep=walker_gl principal_amt no_cc gbp no_bal no_glcode outrec) keeping just those two columns from the imported table and adding the global variables as columns initialized all to 0? 

Short answer: No

If you reference a variable on a keep statement that does not exist in a source data set or created in the body of the data step code I would expect to see a note about the variables in the  Log. Such variables would not be written to the output data.

WARNING: The variable x in the DROP, KEEP, or RENAME list has never been referenced.

 

 

What work.&brand._intl_pop(drop= no_cc gbp no_bal no_glcode outrec) is doing? Since all are global variables only existing on the temp table mentioned above (if that is the case), does that mean that is filtering the first temp table and therefore keeping only walker_gl and principal_amt columns?  

Again if the variables are not actually anywhere in the data sources or code then you would get a message in the log similar to this for each such variable.

WARNING: The variable x in the DROP, KEEP, or RENAME list has never been referenced.

Note: Provide entire data step code.

 

Also your code is using another data set:

workspds.liq_walker_cc_&YYYY_MM.

so you have to consider that one as well. Do you know what this sort of HASH is supposed to do in a generic sense?

 

 

 

 

 

 

CSpanias
Fluorite | Level 6

Thank you for your response @ballardw , I really appreciate it. 

 

The particular program file is a part of a larger program (consisting of 10+ other programs) if that helps. You can see the program from the start up to this step below:

 

%global gbp no_cc no_bal no_glcode outrec;

%macro intl(brand,step=);

data workspds.liq_walker_cc_&YYYY_MM.;
  set basel.liq_walker_cc_full_&YYYY_MM.;
  where ((level_09_name='RET') or (level_10_name='WRT' and level_11_name <> 'WRB'));
run; 
 
data work.&brand._intl_all(keep=walker_gl principal_amt  no_cc gbp no_bal no_glcode outrec) 
        work.&brand._intl_pop(drop= no_cc gbp no_bal no_glcode outrec);

    if _n_ eq 1 then
        do;
            declare hash CC(dataset:"workspds.liq_walker_cc_&YYYY_MM.");
            CC.definekey('walker_costcentre');
            CC.definedone();
         end;

    set &libre..liq_&brand._intl_extract_&yyyy_mm. (rename=(derived_cc=WALKER_COSTCENTRE)) end=eof;
    check= cc.find();
    
    if check ne 0 then
        do;
            no_cc = 1;
        end;
    else if currency = 'GBP' then
        do;
            gbp = 1;
        end;
    else if principal_amt <= 0 then
        do;
            no_bal = 1;
        end;
    else if put(walker_gl,$gl_code.) ne 'INTL' then
        do;
            no_glcode = 1;
        end;
    else outrec = 1;

    output &brand._intl_all;

    if outrec = 1 then output work.&brand._intl_pop;
run;

 The first DATA STEP is filtering a table (hope I got that right) which seems pretty straightforward even for me. I know the general sense of hashing, but at the moment I am just trying to figure out what this program does in a high-level and not exactly how it does it or any optimization techniques. 

In a slightly different note the line below is also confusing:

 if put(walker_gl,$gl_code.) ne 'INTL'

walker_gl is a numerical column, and it seems to be formatted using $gl_code and then compared with a string, which I am not sure how to interpret. I know "$" is used to declare a string column, but unfortunately the Foundations course it went so much!

Tom
Super User Tom
Super User

The key part of the data step is here:

output &brand._intl_all;
if outrec = 1 then output work.&brand._intl_pop;

So it is unconditionally writing every observations to the "ALL" dataset.  But only writing observations to the "INTL_POP" dataset when the OUTREC variable is set to 1.  It is not making any changes to any variables in between writing those two observations so the values for the OUTREC=1 observations should match (other than that the KEEP= dataset options will determine which variables are included).

 if put(walker_gl,$gl_code.) ne 'INTL'

So this code is saying the WALKER_GL is a CHARACTER variable.  You use character formats (start with a $) with character values.  So essentially the PUT() function is converting one string to a different string based on the format defintion.

 

FORMATS convert values to text.  INFORMATS convert text to values.

Numeric formats convert numbers to strings.  Character formats convert strings to other strings.

Character informats convert strings to other strings.  Numeric informats convert strings to numbers.

 

 

CSpanias
Fluorite | Level 6

Thank you for your analysis @Tom . 

 

I am sorry for my wrong choice of terms on my post. Your last reply regarding the key part of the data step made sense, I think I start to understand what's going on. Thank you for that!

Tom
Super User Tom
Super User
  1. imports the &libre..liq_&brand._intl_extract_&yyyy_mm. table

IMPORT is the wrong verb here.  The SET statement is going to READ the data from the dataset.  Note that TABLE is also a confusing word here.  A table could be be something that you want to generate to include in your report along with any figures or charts or graphs you might want to generate.  So you might want to import data from a table in a report, for example from an XLSX file or even a CSV file.

  1. creates two new temporary tables based on that:
    1. work.&brand._intl_all(keep=walker_gl principal_amt no_cc gbp no_bal no_glcode outrec)
    2. work.&brand._intl_pop(drop= no_cc gbp no_bal no_glcode outrec);

Note that work datasets are actually permanent files.  It is just that SAS will delete them when your session ends.

The KEEP= dataset options used with those two datasets specify which of the variables in the "data vector" of the data step will be written into those particular dataset.  Consider the difference between using the KEEP= dataset option on an output dataset and an on and input dataset. Also consider the difference between using the KEEP= dataset option on the output dataset and just use the KEEP statement in the data step.

 

  1. applies some modifications, i.e. keeping / dropping columns.

The data step does what the data step does, you have not shown enough of the data step to know what that is.  The KEEP= dataset option just determines which variables are "kept" when the dataset is written.

  1. Is work.&brand._intl_all(keep=walker_gl principal_amt no_cc gbp no_bal no_glcode outrec) keeping just those two columns from the imported table and adding the global variables as columns initialized all to 0? 

All variables referenced in a data step are variables in that data step, period. All of the variables mentioned in the KEEP= dataset option on all of the output datasets must be variables that are in this data step.

 

There is no such thing as "global variables".  Are you referring to the macro variable BRAND? That was used by the macro pre-processor to generate the name of the dataset.  So if BRAND had a value of ACME then the dataset name that SAS will see when it ready to begin running the generated SAS code will be ACME_INTL_ALL. 

 

 

 

 

 

 

 

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 881 views
  • 3 likes
  • 3 in conversation