BookmarkSubscribeRSS Feed
uncgis
Calcite | Level 5

I have programmed a lot in VBA and now I am trying to move to SAS.    In the example below, I want to determine values (start and end dates) based on one dataset and then use an IF Statement to update my working dataset.  I have tried to explain this below using my VBA code.  Is there a way to do this in SAS?

 

Problem 1.    Obtain dates from a dataset titled rstAY.    If the AY_Desc = AY1 then assign dates.    In this example, there are only three academic years (AYs).   In vba, I can tell my program to LOOP until the end of file (i.e., EOF).   

'Get AY Values
Do Until rstAY.EOF
    
    If rstAY!AY_Desc = "AY1" Then
        AY1 = rstAY!AY_Desc
        AY1_START_DATE = rstAY!AY_START_DATE
        AY1_END_DATE = rstAY!AY_END_DATE
    ElseIf rstAY!AY_Desc = "AY2" Then
        AY2 = rstAY!AY_Desc
        AY2_START_DATE = rstAY!AY_START_DATE
        AY2_END_DATE = rstAY!AY_END_DATE
    ElseIf rstAY!AY_Desc = "AY3" Then
        AY3 = rstAY!AY_Desc
        AY3_START_DATE = rstAY!AY_START_DATE
        AY3_END_DATE = rstAY!AY_END_DATE
    End If

rstAY.MoveNext
Loop

 

Problem 2.   Assign values based on the student application date.   In VBA, I am updating my working dataset (rstIn) to put the correct academic year.  I obtained the academic year (AY) above in the rstAY dataset.  Then it is a simple loop to the end of file (EOF).  At the end of the loop, I edit the table to insert the identified academic year.

'Add AY Value
Do Until rstIn.EOF
  
APPL_Date = rstIn!APPL_Date

    'If statement
    If APPL_Date >= AY1_START_DATE And APPL_Date <= AY1_END_DATE Then
        AY = AY1
    ElseIf APPL_Date >= AY2_START_DATE And APPL_Date <= AY2_END_DATE Then
        AY = AY2
    ElseIf APPL_Date >= AY3_START_DATE And APPL_Date <= AY3_END_DATE Then
        AY = AY3
    Else
        AY = "NOT FOUND"
    End If


'add value to database
rstIn.Edit
rstIn!AY_Desc = AY
rstIn.Update

rstIn.MoveNext
Loop

In VBA---I use this type of code a lot.  I am looking to see how this can be completed in SAS.   I found code for %DO Loops but I don't know how to find the number of passes in a dynamic dataset.    Do you used code to count the number of rows and then complete the DO loop OR is there another way?

 

Thanks for any help.  

7 REPLIES 7
Reeza
Super User

Data steps loop automatically, that's their defining function. 

 

Run this basic code as an example of how it operates, sashelp.class is a defined data set already created in SAS.

 

 

proc print data=sashelp.class;
run;

data want;
*input data set is class;
 set sashelp.class;

*set length of new variables;
length age_cat $15.;

*categorize age - occurs on every row;
  if age < 10 then age_cat = 'Child'
 else if age <13 then age_cat = 'Pre-Teen'
 else if age < 18 then age_cat = 'Teen';

run;

proc print data=want;
run;
ballardw
Super User

@uncgis wrote:

I have programmed a lot in VBA and now I am trying to move to SAS.    In the example below, I want to determine values (start and end dates) based on one dataset and then use an IF Statement to update my working dataset. 


 

 

If you have made your dates in SAS properly then they are numeric in nature and the best way to get "start" and "end" are typically the minimum and maximum values.

Best would be to provide a small example of the data involved and the desired result.

If I understand your VB snippet then you are creating multiple variables based on a group indicator variable (???) named AY_Desc. That is almost certainly not the way to go about things in SAS.

Matching groups of observations (records) based on the value of one or more variables is typically done with BY group processing or possible in Proc SQL Join on criteria.

 

Once you have SAS date values then there are a number of tools to work with them. One extremely powerful tool in SAS is the concept of FORMAT, which is how values are displayed for mere humans to read them. For things that require a single variable to create a group the format is generally one of the most flexible because it does not require adding any variables to a data set at all. The groups created by format assignment are honored by reporting and analysis procedures and for many roles in graphing (custom date intervals on axis being one likely not to work).

A small example with a data set you should have.

proc format;
value firstagegrp
1-12='Preteen'
13-17='Teen'
;
value secondagegrp
12-14= '12 to 14'
15-16= '15 to 16'
;

proc freq data=sashelp.class;
   Title 'First age grouping';
   tables age;
   format age firstagegrp.;
run; title;

proc freq data=sashelp.class;
   Title 'Second age grouping';
   tables age;
   format age secondagegrp.;
run; title;

Note that the output for the second age grouping still displays values individually that were not assigned a group.

 

The syntax of the formats does allow using a data set, properly structured to create the format. So if you have the start/end pairs with a value that assigns the AY1 AY2 AY3 values then a Format could be used with the Appl_date variable without adding a variable. Features of formats to allow an OTHER='text to display' for ranges not found.

 

 

uncgis
Calcite | Level 5

I appreciate your help.  I am getting closer to the answer with your help.   I have used the IF statement and converted the dates to a numeric value. 

But I have hard-coded the start and end dates.   Is there a way to pull that data into this set of code?   

Data apps;
VAR_AY1_Start_Date = input ('2019-01-01', yymmdd10.);
VAR_AY1_End_Date = input ('2019-12-31', yymmdd10.);

VAR_AY2_Start_Date = input ('2020-01-01', yymmdd10.);
VAR_AY2_End_Date = input ('2020-12-31', yymmdd10.);

VAR_AY3_Start_Date = input ('2021-01-01', yymmdd10.);
VAR_AY3_End_Date = input ('2021-12-31', yymmdd10.);

Set work.query_applications;
Var_Appl_Date = input(Appl_Date,yymmdd10.);

Length NEW_AY_DESC $15.;

IF VAR_Appl_DATE  >= VAR_AY3_Start_Date AND VAR_Appl_DATE  <= VAR_AY3_END_Date THEN NEW_AY_DESC = 'AY3';
Else IF VAR_Appl_DATE  >= VAR_AY2_Start_Date AND VAR_Appl_DATE  <= VAR_AY2_END_Date THEN NEW_AY_DESC = 'AY2';
Else If VAR_Appl_DATE  >= VAR_AY1_Start_Date AND VAR_Appl_DATE  <= VAR_AY1_END_Date THEN NEW_AY_DESC = 'AY1';
ELSE NEW_AY_DESC = 'FALSE';


RUN;

 

ballardw
Super User

"Hard code" or use a date literal as "01JAN2019"d.  The d after the quoted value in ddMONyyyy form is how SAS wants a data literal value since there are so many different ways of expressing dates with just digits.

 

If this were my code I would not introduce any new variables, just the values in the correct place. Assuming that I actually had dates that needed to be bounds.

 

data apps;
   set work.query_applications;
   Var_appl_date= input(Appl_date,yymmdd10.);
   length New_AY_DESC $ 5;
   if '01JAN2019'd le Var_appl_date le '31Dec2019'd then New_AY_Desc='AY1';
   else if '01JAN2020'd le Var_appl_date le '31Dec2020'd then New_AY_Desc='AY2';
   else if '01JAN2021'd le Var_appl_date le '31Dec2021'd then New_AY_Desc='AY3';
   else New_AY_Desc='FALSE';
run;

However with that specific range of dates, which correspond to calendar years I would use information about the YEAR of the application date.

 

data apps;
   set work.query_applications;
   Var_appl_date= input(Appl_date,yymmdd10.);
   length New_AY_DESC $ 5;
   select ( year(Var_appl_date) );
      when (2019) New_AY_Desc='AY1'; 
      when (2020) New_AY_Desc='AY2'; 
      when (2021) New_AY_Desc='AY3'; 
      otherwise   New_AY_Desc='False';
   end;
run;

The Select evaluates an expression and then branches to the WHEN (value) that holds the result. Otherwise is the result if none of the results match the provided lists. The When can use a comma delimited list of values.

 

Actually with a sequential list of years I might be tempted to get the offset from the first value and concatenate an expression, but those are easy to make confusing and forget what was going on when you haven't looked at the code for a while and need to update for the next year. Which is one reason to look at FORMATS for such assignments based on single variables: One Format is much easier to change than to find all the places you looked at multiple values and insert the new one.

 

Note: The LENGTH for the New_AY_Desc is needed because the behavior with SAS for newly created character variable is if a length is not set prior then the first use will establish the length. So with your code using values like 'AY3' the length would be set to three characters and you would be wondering why you have values of "FAL" for the other years of data.

Tom
Super User Tom
Super User

Now it looks like you just want the YEAR() function to figure which year the date appears in.

You might want to just create a FORMAT that maps the year to your labels.

proc format ;
value ay
  2019 = 'AY1'
  2020 = 'AY2'
  2021 = 'AY3'
 other = 'FALSE'
;
run;

Now your program might look like just this:

data apps;
   set query_applications;
   year = year(appl_date);
   format year ay.;
run;
Reeza
Super User
proc format ;
value ay
  2019 = 'AY1'
  2020 = 'AY2'
  2021 = 'AY3'
 other = 'FALSE'
;
run;


data apps;
   set query_applications;
   appl_year = year(appl_date);
   new_ay_desc = put(appl_year , ay.);
run;

Slight variation on @Tom's solution that still creates your new variable. 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 2683 views
  • 4 likes
  • 4 in conversation