I have programmed a lot in VBA and now I am trying to move to SAS. In the example below, I want to determine values (start and end dates) based on one dataset and then use an IF Statement to update my working dataset. I have tried to explain this below using my VBA code. Is there a way to do this in SAS?
Problem 1. Obtain dates from a dataset titled rstAY. If the AY_Desc = AY1 then assign dates. In this example, there are only three academic years (AYs). In vba, I can tell my program to LOOP until the end of file (i.e., EOF).
'Get AY Values
Do Until rstAY.EOF
If rstAY!AY_Desc = "AY1" Then
AY1 = rstAY!AY_Desc
AY1_START_DATE = rstAY!AY_START_DATE
AY1_END_DATE = rstAY!AY_END_DATE
ElseIf rstAY!AY_Desc = "AY2" Then
AY2 = rstAY!AY_Desc
AY2_START_DATE = rstAY!AY_START_DATE
AY2_END_DATE = rstAY!AY_END_DATE
ElseIf rstAY!AY_Desc = "AY3" Then
AY3 = rstAY!AY_Desc
AY3_START_DATE = rstAY!AY_START_DATE
AY3_END_DATE = rstAY!AY_END_DATE
End If
rstAY.MoveNext
Loop
Problem 2. Assign values based on the student application date. In VBA, I am updating my working dataset (rstIn) to put the correct academic year. I obtained the academic year (AY) above in the rstAY dataset. Then it is a simple loop to the end of file (EOF). At the end of the loop, I edit the table to insert the identified academic year.
'Add AY Value
Do Until rstIn.EOF
APPL_Date = rstIn!APPL_Date
'If statement
If APPL_Date >= AY1_START_DATE And APPL_Date <= AY1_END_DATE Then
AY = AY1
ElseIf APPL_Date >= AY2_START_DATE And APPL_Date <= AY2_END_DATE Then
AY = AY2
ElseIf APPL_Date >= AY3_START_DATE And APPL_Date <= AY3_END_DATE Then
AY = AY3
Else
AY = "NOT FOUND"
End If
'add value to database
rstIn.Edit
rstIn!AY_Desc = AY
rstIn.Update
rstIn.MoveNext
Loop
In VBA---I use this type of code a lot. I am looking to see how this can be completed in SAS. I found code for %DO Loops but I don't know how to find the number of passes in a dynamic dataset. Do you used code to count the number of rows and then complete the DO loop OR is there another way?
Thanks for any help.
Data steps loop automatically, that's their defining function.
Run this basic code as an example of how it operates, sashelp.class is a defined data set already created in SAS.
proc print data=sashelp.class;
run;
data want;
*input data set is class;
set sashelp.class;
*set length of new variables;
length age_cat $15.;
*categorize age - occurs on every row;
if age < 10 then age_cat = 'Child'
else if age <13 then age_cat = 'Pre-Teen'
else if age < 18 then age_cat = 'Teen';
run;
proc print data=want;
run;
@uncgis wrote:
I have programmed a lot in VBA and now I am trying to move to SAS. In the example below, I want to determine values (start and end dates) based on one dataset and then use an IF Statement to update my working dataset.
If you have made your dates in SAS properly then they are numeric in nature and the best way to get "start" and "end" are typically the minimum and maximum values.
Best would be to provide a small example of the data involved and the desired result.
If I understand your VB snippet then you are creating multiple variables based on a group indicator variable (???) named AY_Desc. That is almost certainly not the way to go about things in SAS.
Matching groups of observations (records) based on the value of one or more variables is typically done with BY group processing or possible in Proc SQL Join on criteria.
Once you have SAS date values then there are a number of tools to work with them. One extremely powerful tool in SAS is the concept of FORMAT, which is how values are displayed for mere humans to read them. For things that require a single variable to create a group the format is generally one of the most flexible because it does not require adding any variables to a data set at all. The groups created by format assignment are honored by reporting and analysis procedures and for many roles in graphing (custom date intervals on axis being one likely not to work).
A small example with a data set you should have.
proc format; value firstagegrp 1-12='Preteen' 13-17='Teen' ; value secondagegrp 12-14= '12 to 14' 15-16= '15 to 16' ; proc freq data=sashelp.class; Title 'First age grouping'; tables age; format age firstagegrp.; run; title; proc freq data=sashelp.class; Title 'Second age grouping'; tables age; format age secondagegrp.; run; title;
Note that the output for the second age grouping still displays values individually that were not assigned a group.
The syntax of the formats does allow using a data set, properly structured to create the format. So if you have the start/end pairs with a value that assigns the AY1 AY2 AY3 values then a Format could be used with the Appl_date variable without adding a variable. Features of formats to allow an OTHER='text to display' for ranges not found.
I appreciate your help. I am getting closer to the answer with your help. I have used the IF statement and converted the dates to a numeric value.
But I have hard-coded the start and end dates. Is there a way to pull that data into this set of code?
Data apps;
VAR_AY1_Start_Date = input ('2019-01-01', yymmdd10.);
VAR_AY1_End_Date = input ('2019-12-31', yymmdd10.);
VAR_AY2_Start_Date = input ('2020-01-01', yymmdd10.);
VAR_AY2_End_Date = input ('2020-12-31', yymmdd10.);
VAR_AY3_Start_Date = input ('2021-01-01', yymmdd10.);
VAR_AY3_End_Date = input ('2021-12-31', yymmdd10.);
Set work.query_applications;
Var_Appl_Date = input(Appl_Date,yymmdd10.);
Length NEW_AY_DESC $15.;
IF VAR_Appl_DATE >= VAR_AY3_Start_Date AND VAR_Appl_DATE <= VAR_AY3_END_Date THEN NEW_AY_DESC = 'AY3';
Else IF VAR_Appl_DATE >= VAR_AY2_Start_Date AND VAR_Appl_DATE <= VAR_AY2_END_Date THEN NEW_AY_DESC = 'AY2';
Else If VAR_Appl_DATE >= VAR_AY1_Start_Date AND VAR_Appl_DATE <= VAR_AY1_END_Date THEN NEW_AY_DESC = 'AY1';
ELSE NEW_AY_DESC = 'FALSE';
RUN;
"Hard code" or use a date literal as "01JAN2019"d. The d after the quoted value in ddMONyyyy form is how SAS wants a data literal value since there are so many different ways of expressing dates with just digits.
If this were my code I would not introduce any new variables, just the values in the correct place. Assuming that I actually had dates that needed to be bounds.
data apps; set work.query_applications; Var_appl_date= input(Appl_date,yymmdd10.); length New_AY_DESC $ 5; if '01JAN2019'd le Var_appl_date le '31Dec2019'd then New_AY_Desc='AY1'; else if '01JAN2020'd le Var_appl_date le '31Dec2020'd then New_AY_Desc='AY2'; else if '01JAN2021'd le Var_appl_date le '31Dec2021'd then New_AY_Desc='AY3'; else New_AY_Desc='FALSE';
run;
However with that specific range of dates, which correspond to calendar years I would use information about the YEAR of the application date.
data apps; set work.query_applications; Var_appl_date= input(Appl_date,yymmdd10.); length New_AY_DESC $ 5; select ( year(Var_appl_date) ); when (2019) New_AY_Desc='AY1'; when (2020) New_AY_Desc='AY2'; when (2021) New_AY_Desc='AY3'; otherwise New_AY_Desc='False'; end; run;
The Select evaluates an expression and then branches to the WHEN (value) that holds the result. Otherwise is the result if none of the results match the provided lists. The When can use a comma delimited list of values.
Actually with a sequential list of years I might be tempted to get the offset from the first value and concatenate an expression, but those are easy to make confusing and forget what was going on when you haven't looked at the code for a while and need to update for the next year. Which is one reason to look at FORMATS for such assignments based on single variables: One Format is much easier to change than to find all the places you looked at multiple values and insert the new one.
Note: The LENGTH for the New_AY_Desc is needed because the behavior with SAS for newly created character variable is if a length is not set prior then the first use will establish the length. So with your code using values like 'AY3' the length would be set to three characters and you would be wondering why you have values of "FAL" for the other years of data.
Now it looks like you just want the YEAR() function to figure which year the date appears in.
You might want to just create a FORMAT that maps the year to your labels.
proc format ;
value ay
2019 = 'AY1'
2020 = 'AY2'
2021 = 'AY3'
other = 'FALSE'
;
run;
Now your program might look like just this:
data apps;
set query_applications;
year = year(appl_date);
format year ay.;
run;
proc format ;
value ay
2019 = 'AY1'
2020 = 'AY2'
2021 = 'AY3'
other = 'FALSE'
;
run;
data apps;
set query_applications;
appl_year = year(appl_date);
new_ay_desc = put(appl_year , ay.);
run;
Slight variation on @Tom's solution that still creates your new variable.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.