Hello,
we have many SAS programs which contains a by statement follow by a if first statement.
At the begining, the variables into the by statement seems to have a certain logic (good variables grouping)
but with time, we need to add more variables and I wonder how can we check if this new grouping does not change
the logic as it was first set.
Old version:
DATA TEMP.POLENTR;
SET TEMP.POLENTR;
BY MOIS TERR PROVINCE_CD CIE UNDERWRITING_COMPANY_CD BRANCH BROKER TRANSFER APER POLICY DEBU_TER EFFDAT TRSNUM CODE_TX
DESCENDING PRIME TYPE3 POLTYP ;
POLCOUNT=INPUT(PUT(0,15.5),15.5);
IF FIRST.code_tx ;
IF CODE_TX IN ('-','1','5','3') THEN POLCOUNT= TERME/12;
ELSE IF CODE_TX IN ('2','6','4') THEN POLCOUNT= -1*TERME/12;
RUN;
New Version:
DATA TEMP.POLENTR;
SET TEMP.POLENTR;
BY MOIS TERR PROVINCE_CD CIE UNDERWRITING_COMPANY_CD BRANCH AFFINITY_IND GCNA_BRANCH BROKER TRANSFER APER
POLICY POLTYP DEBU_TER EFFDAT TRSNUM TYPE3 POLICY_CATEGORY NETWORK_PROVIDER CODE_TX DESCENDING PRIME ;
POLCOUNT=INPUT(PUT(0,15.5),15.5);
IF FIRST.code_tx ;
IF CODE_TX IN ('-','1','5','3') THEN POLCOUNT= TERME/12;
ELSE IF CODE_TX IN ('2','6','4') THEN POLCOUNT= -1*TERME/12;
RUN;
It is not clear what you are asking, or even what the purpose of the code it.
It kind of looks like you are trying take the distinct observations based on a long list of key variables. To make sure which of the multiple observations you are taking you appear to have added a variable named PRIME ( and possibly others) so that the one you want is the first.
Which set of variables are changing? The key variables that define the BY group? Or the variables that set the ordering of the repetitions within the by groups?
You could place the lists into macro variables:
%let keys=MOIS TERR PROVINCE_CD CIE UNDERWRITING_COMPANY_CD BRANCH BROKER TRANSFER APER
POLICY DEBU_TER EFFDAT TRSNUM CODE_TX
;
%let sorters=DESCENDING PRIME TYPE3 POLTYP ;
%let last_key = %scan(&keys,-1,%str( ));
And then you can use the macro variables to generate the code. That way you only have the make changes to the lists in one place.
proc sort data=temp.polentr;
by &keys &sorters;
run;
DATA TEMP.POLENTR;
SET TEMP.POLENTR;
BY &keys &sorters;
IF FIRST.&last_key ;
POLCOUNT=INPUT(PUT(0,15.5),15.5);
IF CODE_TX IN ('-','1','5','3') THEN POLCOUNT= TERME/12;
ELSE IF CODE_TX IN ('2','6','4') THEN POLCOUNT= -1*TERME/12;
RUN;
Since we don't know your data, and can't answer your concerns directly, let me ask you ... what happens if you run this code ... does it do what you want?
Also, I strongly recommend that you not overwrite existing data sets by creating a new data set with the exact same data set name, like this:
DATA TEMP.POLENTR;
SET TEMP.POLENTR;
This is poor programming practice. Better would be:
DATA TEMP.POLENTR1;
SET TEMP.POLENTR;
or
DATA work.POLENTR;
SET TEMP.POLENTR;
Without data and definitions of what "good variables grouping" means in terms of the data this really can't be answered by us.
One comment, as it relates to even testing the behavior of this code is that this structure use the same data set name on Set and Data statements completely replaces the data set.
DATA TEMP.POLENTR;
SET TEMP.POLENTR;
I would be tempted to run both of your codes (with appropriate sorts in between) generating a set with the "old" form as one data set and the "new" into a different. Then compare the values of the variables that you consider needed to define "good variable grouping" between the two.
Pretty much can guarantee that unless the added variables only have one value with the BY variables that precede them that you will have more instances of the "old" By variable combinations.
It is not clear what you are asking, or even what the purpose of the code it.
It kind of looks like you are trying take the distinct observations based on a long list of key variables. To make sure which of the multiple observations you are taking you appear to have added a variable named PRIME ( and possibly others) so that the one you want is the first.
Which set of variables are changing? The key variables that define the BY group? Or the variables that set the ordering of the repetitions within the by groups?
You could place the lists into macro variables:
%let keys=MOIS TERR PROVINCE_CD CIE UNDERWRITING_COMPANY_CD BRANCH BROKER TRANSFER APER
POLICY DEBU_TER EFFDAT TRSNUM CODE_TX
;
%let sorters=DESCENDING PRIME TYPE3 POLTYP ;
%let last_key = %scan(&keys,-1,%str( ));
And then you can use the macro variables to generate the code. That way you only have the make changes to the lists in one place.
proc sort data=temp.polentr;
by &keys &sorters;
run;
DATA TEMP.POLENTR;
SET TEMP.POLENTR;
BY &keys &sorters;
IF FIRST.&last_key ;
POLCOUNT=INPUT(PUT(0,15.5),15.5);
IF CODE_TX IN ('-','1','5','3') THEN POLCOUNT= TERME/12;
ELSE IF CODE_TX IN ('2','6','4') THEN POLCOUNT= -1*TERME/12;
RUN;
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.