Solved: How to set up multiple conditonal code ?

Phil_NZ · Posted 06-12-2021 03:13 AM

Hi all,

Have a fabulous weekend,

I have a conditional code below

The idea is that: my dataset for these countries is from 1989 to 2020 (yr from 1989 to 2020). But I just want to keep the sample from 1990 to 2020

Afterward, I want to create a variable called pt. The pt variable receive the value of 0 if geogn="UNITEDS" and the year is from 1990 to 1992 (one year before 1993) and receiving the value of 1 if geogn="UNITEDS" and the year is from 1993 to 2020. Similarly, the pt variable receiving the value of 1 if geogn="SOUTHKOREA" and the year is from 1997 to 2020 and receive the value of 0 from 1990 to 1996(one year before 1997).

A similar creation for other countries, where 1993, 1997, 2000,2001, 2004, 2018 are law implementation years of the countries.

My novice code is as below

data gen_post_treat;
 set matching;
 yr=input(year, ?? 32.);/*year is a character variable, so I convert to numeric variable*/
 if  geogn="UNITEDS"       & yr in (1993:2020)
   | geogn="SOUTHKOREA"    & yr in (1997:2020)
   | geogn="BRAZIL"        & yr in (2000:2020)
   | geogn="CANADA"        & yr in (2000:2020)
   | geogn="CZECH"         & yr in (2001:2020)
   | geogn="LATVIA"        & yr in (2004:2020)
   | geogn="VIETNAM"       & yr in (2018:2020)
then pt=1;
else if yr ne 1989 then pt=0;
run;

And last but not least, I want to create a variable called the converted year (named cvt_yr) that focuses on 9 years surrounding the law implementation years.

For example, in SOUTHKOREA case:

cvt_yr= -4 if geogn="SOUTHKOREA"    & yr=1993
cvt_yr= -3 if geogn="SOUTHKOREA"    & yr=1994
cvt_yr= -2 if geogn="SOUTHKOREA"    & yr=1995
cvt_yr= -1 if geogn="SOUTHKOREA"    & yr=1996
cvt_yr=  0 if geogn="SOUTHKOREA"    & yr=1997
cvt_yr= 1 if geogn="SOUTHKOREA"    & yr=1998
....
cvt_yr= 5 if geogn="SOUTHKOREA"    & yr=2002

Similarly

cvt_yr= -4 if geogn="CZECH"    & yr=1997
...
cvt_yr= 0 if geogn="CZECH"    & yr=2001
...
cvt_yr= 5 if geogn="CZECH"    & yr=2006

Similarly applied for other countries.

For UNITEDS, we only trace back to 1990 due to data limitation

cvt_yr= -3 if geogn="UNITEDS"    & yr=1990

and for Vietnam, we only trace forward to 2020 due to data limitation

cvt_yr= 2 if geogn="VIETNAM"    & yr=2020

Please let me know if my explanation is not clear.

Thank you so much and warm regards.

Phil.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

jimbarbour · Posted 06-12-2021 07:24 PM

@Phil_NZ ,

There are many ways to do this. However, when I see the exact same process being done over and over but each time to a different entity, my mind immediately goes to macro processing. Just as you were, I'm going to use a Data step to set PT and Cvt_Yr, but I'm going to put the Data step inside a macro wrapper and pass parameters to the macro. Code once, parameterize, pass in parameters. This actually minimizes code in the long run although if it's a "one and done" process that you'll never see again, it may not be worth it.

My code is below. Comments:

First I build a little data set called Law_Years. This data drives the overall process. The nice thing about this code is that if you get additional countries, you add them to the Law_Years table and you do not have to change any code. As I read each record, I load a macro array containing the GeoGn and the Law_Year.

Second, I read in some test data that I just typed up based on your original post.

Third, I get a count of the number of distinct nations and put it into a macro variable. I could have gotten the count in step one, but this step eliminates the possibility of duplicates.

Fourth, I have a macro with the Data step that will be setting PT and Cvt_Yr. Passed in as parameters are the GeoGn, the StartYr, the EndYr, and the number corresponding to a given GeoGn's position in the macro array. An intermediate dataset will be created using this number.

Fifth, I run the Data step using a second macro. All the second macro does is iterate through the macro array, format the parameters, and invoke the first macro.

Sixth, I have a little macro that all it does is list out the names of the intermediate data sets.

Seventh and last, I have a step that combines all the intermediate datasets into one.

Give it a try with your data and see how it goes.

DATA	Law_Years;

	DROP	_:;
	LENGTH	GeoGn	$12;
	LENGTH	LawYear	$4;
	INPUT
		GeoGn	$
		LawYear
		;
	_Count	+	1;
	CALL	SYMPUTX(CATS('GeoGn', PUT(_Count, 3.)), GeoGn);
	CALL	SYMPUTX(CATS('LawYear', PUT(_Count, 3.)), LawYear);
DATALINES;
UNITEDS  1993
SOUTHKOREA  1997
BRAZIL  2000
CANADA  2000
CZECH  2001
LATVIA  2004
VIETNAM  2018
;
RUN;

DATA	Matching;
	LENGTH	GeoGn	$12;
	LENGTH	Year	$4;
	INPUT	GeoGn	$
			Year	$
			;
DATALINES;
SOUTHKOREA  1989
SOUTHKOREA  1993
SOUTHKOREA  1994
SOUTHKOREA  1995
SOUTHKOREA  1996
SOUTHKOREA  1997
SOUTHKOREA  1998
SOUTHKOREA  1999
SOUTHKOREA  2000
SOUTHKOREA  2001
SOUTHKOREA  2002
SOUTHKOREA  2021
CZECH  1989
CZECH  1997
CZECH  1998
CZECH  1999
CZECH  2000
CZECH  2001
CZECH  2002
CZECH  2003
CZECH  2004
CZECH  2005
CZECH  2006
CZECH  2021
;
RUN;

PROC	SQL	NOPRINT;
	SELECT	COUNT(DISTINCT  GeoGn)
		INTO	:	Nations_Cnt
		FROM		Law_Years;
QUIT;
%PUT  NOTE:  &=Nations_Cnt;

%MACRO	Set_PT(GeoGn, StartYr, EndYr, DSN_Nbr);
	DATA	gen_post_treat_&DSN_Nbr;
		set matching;
		yr=input(year, ?? 32.);	/*year is a character variable, so I convert to numeric variable*/

		IF	STRIP(UPCASE(GeoGn))	=	"&GeoGn"	AND
			1990		<=	Yr		<=	2020;

		IF	&StartYr	<=	Yr		<=	&EndYr		THEN
			PT						=	1;
		ELSE
			PT						=	0;

		Cvt_Yr						=	Yr	-	&StartYr;
	RUN;
%MEND	Set_PT;

%MACRO	Create_Datasets;
	%LOCAL	i;
	%DO	i	=	1	%TO	&Nations_Cnt;
		%Set_PT(&&GeoGn&i,	&&LawYear&i, 2020, &i);
	%END;
%MEND	Create_Datasets;
%Create_Datasets;

%MACRO	List_Dataset_Names;
	%LOCAL	i;
	%DO	i	=	1	%TO	&Nations_Cnt;
		Gen_Post_Treat_%SYSFUNC(STRIP(&i))
	%END;
%MEND	List_Dataset_Names;

DATA	Gen_Post_Treat;
	SET
		%List_Dataset_Names;
		;
RUN;

Jim

View solution in original post

jimbarbour · Posted 06-12-2021 07:24 PM

@Phil_NZ ,

There are many ways to do this. However, when I see the exact same process being done over and over but each time to a different entity, my mind immediately goes to macro processing. Just as you were, I'm going to use a Data step to set PT and Cvt_Yr, but I'm going to put the Data step inside a macro wrapper and pass parameters to the macro. Code once, parameterize, pass in parameters. This actually minimizes code in the long run although if it's a "one and done" process that you'll never see again, it may not be worth it.

My code is below. Comments:

First I build a little data set called Law_Years. This data drives the overall process. The nice thing about this code is that if you get additional countries, you add them to the Law_Years table and you do not have to change any code. As I read each record, I load a macro array containing the GeoGn and the Law_Year.

Second, I read in some test data that I just typed up based on your original post.

Third, I get a count of the number of distinct nations and put it into a macro variable. I could have gotten the count in step one, but this step eliminates the possibility of duplicates.

Fourth, I have a macro with the Data step that will be setting PT and Cvt_Yr. Passed in as parameters are the GeoGn, the StartYr, the EndYr, and the number corresponding to a given GeoGn's position in the macro array. An intermediate dataset will be created using this number.

Fifth, I run the Data step using a second macro. All the second macro does is iterate through the macro array, format the parameters, and invoke the first macro.

Sixth, I have a little macro that all it does is list out the names of the intermediate data sets.

Seventh and last, I have a step that combines all the intermediate datasets into one.

Give it a try with your data and see how it goes.

DATA	Law_Years;

	DROP	_:;
	LENGTH	GeoGn	$12;
	LENGTH	LawYear	$4;
	INPUT
		GeoGn	$
		LawYear
		;
	_Count	+	1;
	CALL	SYMPUTX(CATS('GeoGn', PUT(_Count, 3.)), GeoGn);
	CALL	SYMPUTX(CATS('LawYear', PUT(_Count, 3.)), LawYear);
DATALINES;
UNITEDS  1993
SOUTHKOREA  1997
BRAZIL  2000
CANADA  2000
CZECH  2001
LATVIA  2004
VIETNAM  2018
;
RUN;

DATA	Matching;
	LENGTH	GeoGn	$12;
	LENGTH	Year	$4;
	INPUT	GeoGn	$
			Year	$
			;
DATALINES;
SOUTHKOREA  1989
SOUTHKOREA  1993
SOUTHKOREA  1994
SOUTHKOREA  1995
SOUTHKOREA  1996
SOUTHKOREA  1997
SOUTHKOREA  1998
SOUTHKOREA  1999
SOUTHKOREA  2000
SOUTHKOREA  2001
SOUTHKOREA  2002
SOUTHKOREA  2021
CZECH  1989
CZECH  1997
CZECH  1998
CZECH  1999
CZECH  2000
CZECH  2001
CZECH  2002
CZECH  2003
CZECH  2004
CZECH  2005
CZECH  2006
CZECH  2021
;
RUN;

PROC	SQL	NOPRINT;
	SELECT	COUNT(DISTINCT  GeoGn)
		INTO	:	Nations_Cnt
		FROM		Law_Years;
QUIT;
%PUT  NOTE:  &=Nations_Cnt;

%MACRO	Set_PT(GeoGn, StartYr, EndYr, DSN_Nbr);
	DATA	gen_post_treat_&DSN_Nbr;
		set matching;
		yr=input(year, ?? 32.);	/*year is a character variable, so I convert to numeric variable*/

		IF	STRIP(UPCASE(GeoGn))	=	"&GeoGn"	AND
			1990		<=	Yr		<=	2020;

		IF	&StartYr	<=	Yr		<=	&EndYr		THEN
			PT						=	1;
		ELSE
			PT						=	0;

		Cvt_Yr						=	Yr	-	&StartYr;
	RUN;
%MEND	Set_PT;

%MACRO	Create_Datasets;
	%LOCAL	i;
	%DO	i	=	1	%TO	&Nations_Cnt;
		%Set_PT(&&GeoGn&i,	&&LawYear&i, 2020, &i);
	%END;
%MEND	Create_Datasets;
%Create_Datasets;

%MACRO	List_Dataset_Names;
	%LOCAL	i;
	%DO	i	=	1	%TO	&Nations_Cnt;
		Gen_Post_Treat_%SYSFUNC(STRIP(&i))
	%END;
%MEND	List_Dataset_Names;

DATA	Gen_Post_Treat;
	SET
		%List_Dataset_Names;
		;
RUN;

Jim

Phil_NZ · Posted 06-12-2021 08:24 PM

Hi @jimbarbour

I have a couple of questions in advance here, your code is quite a bit beyond my knowledge so I need to print it out and read it carefully.

1. What does the fourth dimension in your macro Set_PT mean? I did not see you refer it previously.

2. What is the purpose of creating the dataset "Matching" with not continuous years like that? Or as you said, you just used the number already from my post? Ahh, or do you mean "matching" is my original dataset? So, does it mean that, in my case, I can create a dataset Nations_Cnt directly by using PROC SQL from dataset matching without creating the dataset Law_Years like your?

3. whether the code

%PUT  NOTE:  &=Nations_Cnt;

just to write down to the log that we are heading to a specific country?

4. And I believe that, you write some abbreviation with purpose, can I ask what do "DSN_Nrb" and "Nations_Cnt" stand for, that I can explain these words and the situation easier?

Thank you so much

Phil.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.

jimbarbour · Posted 06-12-2021 08:57 PM

@Phil_NZ wrote:

1. What does the fourth dimension in your macro Set_PT mean? I did not see you refer it previously.

@Phil_NZ , the macro processes one country at a time. The Data step writes out a file named, "gen_post_treat." However, if I use the same filename for every country, the 2nd country will overwrite the first, the 3rd the 2nd, and so on. So what I do is pass in the fourth parameter, &i. The macro variable &i is just a number, a counter, that starts at one and goes up to however many countries there are in your Law_Year table. This number is suffixed onto the file name so that an intermediate file is created with a unique name for each country.

@Phil_NZ wrote:

2. What is the purpose of creating the dataset "Matching" with not continuous years like that? Or as you said, you just used the number already from my post?

Well, mostly I took the examples from your original post. I added 1989 and 2021 to test years that fall out of the range of consideration. Basically, anything that is before 1990 or after 2020, we want to exclude. By putting 1989 and 2021 into the data, I test my range. If my code filters out 1989 and 2021, then I have validated that the code is working properly.

@Phil_NZ wrote:

3. whether the code
%PUT  NOTE:  &=Nations_Cnt;
just to write down to the log that we are heading to a specific country?

%PUT does indeed write to the log. The %PUT immediately follows a Proc SQL step that creates a macro variable. I want to know if the value of the macro variable is correct, so I write it to the log. In the log, it looks like the below. Nations count is simple a count of how many nations there are in the Law_Year table. In this case, there are seven (US, S Korea, Brazil, Canada, Czech Republic, Latvia, and Vietnam).

@Phil_NZ wrote:

4. And I believe that, you write some abbreviation with purpose, can I ask what do "DSN_Nrb" and "Nations_Cnt" stand for, that I can explain these words and the situation easier?

DSN_Nbr is just a number from one to the number of nations. Nations_Cnt is a count of the number of nations in the Law_Year table. You gave 7 examples in your original post. Elsewhere you mentioned that there are I believe 64. "DSN" stands for Data Set Name. Therefore DSN_Nbr is the number that is suffixed on to the back end of the data set name to make each data set name unique.

Here, below, using the examples from your original post, are the intermediate data sets I created, one for each country. Notice how there is a number on the back end of each dataset name. The number on the back of each dataset name is the dataset name number or DSN_Nbr.

I hope that helps, but this can't be easy code if someone is new to SAS. However, you did ask for code that would make the job of working with 64 nations easier, so... 🙂

In the Data step, I'm excluding all the data that is not applicable and setting all the values doing everything in just three commands: Two IF statements and one mathematical expression. I can't make it much more concise than that. Three commands for 64 nations is fairly reasonable I think.

Jim

How to set up multiple conditonal code ?

Re: How to set up multiple conditonal code ?

Re: How to set up multiple conditonal code ?

Re: How to set up multiple conditonal code ?

Re: How to set up multiple conditonal code ?

How to set up multiple conditonal code ?

Re: How to set up multiple conditonal code ?

Re: How to set up multiple conditonal code ?

Re: How to set up multiple conditonal code ?

Re: How to set up multiple conditonal code ?

SAS Innovate 2025: Call for Content

Click image to register for webinar

Classroom Training Available!