BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Phil_NZ
Barite | Level 11

Hi all,

Have a fabulous weekend,

I have a conditional code below

The idea is that: my dataset for these countries is from 1989 to 2020 (yr from 1989 to 2020). But I just want to keep the sample from 1990 to 2020

 

Afterward, I want to create a variable called pt. The pt variable receive the value of 0 if geogn="UNITEDS" and the year is from 1990 to 1992 (one year before 1993) and receiving the value of 1 if geogn="UNITEDS" and the year is from 1993 to 2020. Similarly,  the pt variable receiving the value of 1 if geogn="SOUTHKOREA" and the year is from 1997 to 2020 and receive the value of 0 from 1990 to 1996(one year before 1997).

A similar creation for other countries, where 1993, 1997, 2000,2001, 2004, 2018 are law implementation years of the countries.

My novice code is as below

data gen_post_treat;
 set matching;
 yr=input(year, ?? 32.);/*year is a character variable, so I convert to numeric variable*/
 if  geogn="UNITEDS"       & yr in (1993:2020)
   | geogn="SOUTHKOREA"    & yr in (1997:2020)
   | geogn="BRAZIL"        & yr in (2000:2020)
   | geogn="CANADA"        & yr in (2000:2020)
   | geogn="CZECH"         & yr in (2001:2020)
   | geogn="LATVIA"        & yr in (2004:2020)
   | geogn="VIETNAM"       & yr in (2018:2020)
then pt=1;
else if yr ne 1989 then pt=0;
run;

 

And last but not least, I want to create a variable called the converted year (named cvt_yr) that focuses on 9 years surrounding the law implementation years.

For example, in SOUTHKOREA case:

 

cvt_yr= -4 if geogn="SOUTHKOREA"    & yr=1993
cvt_yr= -3 if geogn="SOUTHKOREA"    & yr=1994
cvt_yr= -2 if geogn="SOUTHKOREA"    & yr=1995
cvt_yr= -1 if geogn="SOUTHKOREA"    & yr=1996
cvt_yr=  0 if geogn="SOUTHKOREA"    & yr=1997
cvt_yr= 1 if geogn="SOUTHKOREA"    & yr=1998
....
cvt_yr= 5 if geogn="SOUTHKOREA"    & yr=2002

Similarly

cvt_yr= -4 if geogn="CZECH"    & yr=1997
...
cvt_yr= 0 if geogn="CZECH" & yr=2001
...
cvt_yr= 5 if geogn="CZECH" & yr=2006

 

Similarly applied for other countries.

For UNITEDS, we only trace back to 1990 due to data limitation

cvt_yr= -3 if geogn="UNITEDS"    & yr=1990

and for Vietnam, we only trace forward to 2020 due to data limitation

cvt_yr= 2 if geogn="VIETNAM"    & yr=2020

Please let me know if my explanation is not clear.

 

Thank you so much and warm regards.

Phil.

 

 

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
1 ACCEPTED SOLUTION

Accepted Solutions
jimbarbour
Meteorite | Level 14

@Phil_NZ ,

 

There are many ways to do this.  However, when I see the exact same process being done over and over but each time to a different entity, my mind immediately goes to macro processing.  Just as you were, I'm going to use a Data step to set PT and Cvt_Yr, but I'm going to put the Data step inside a macro wrapper and pass parameters to the macro.  Code once, parameterize, pass in parameters.  This actually minimizes code in the long run although if it's a "one and done" process that you'll never see again, it may not be worth it.

 

My code is below.  Comments: 

First I build a little data  set called Law_Years.  This data drives the overall process.  The nice thing about this code is that if you get additional countries, you add them to the Law_Years table and you do not have to change any code.  As I read each record, I load a macro array containing the GeoGn and the Law_Year.

Second, I read in some test data that I just typed up based on your original post.

Third, I get a count of the number of distinct nations and put it into a macro variable.  I could have gotten the count in step one, but this  step eliminates the possibility of duplicates.

Fourth, I have a macro with the Data step that will be setting PT and Cvt_Yr.  Passed in as parameters are the GeoGn, the StartYr, the EndYr, and the number corresponding to a given GeoGn's position in the macro array.  An intermediate dataset will be created using this number.

Fifth, I run the Data step using a second macro.  All the second macro does is iterate through the macro array, format the parameters, and invoke the first macro.

Sixth, I have a little macro that all it does is list out the names of the intermediate data sets.

Seventh and last, I have a step that combines all the intermediate datasets into one.

 

Give it a try with your data and see how it goes.

DATA	Law_Years;
DROP _:; LENGTH GeoGn $12; LENGTH LawYear $4; INPUT GeoGn $ LawYear ; _Count + 1; CALL SYMPUTX(CATS('GeoGn', PUT(_Count, 3.)), GeoGn); CALL SYMPUTX(CATS('LawYear', PUT(_Count, 3.)), LawYear); DATALINES; UNITEDS 1993 SOUTHKOREA 1997 BRAZIL 2000 CANADA 2000 CZECH 2001 LATVIA 2004 VIETNAM 2018 ; RUN; DATA Matching; LENGTH GeoGn $12; LENGTH Year $4; INPUT GeoGn $ Year $ ; DATALINES; SOUTHKOREA 1989 SOUTHKOREA 1993 SOUTHKOREA 1994 SOUTHKOREA 1995 SOUTHKOREA 1996 SOUTHKOREA 1997 SOUTHKOREA 1998 SOUTHKOREA 1999 SOUTHKOREA 2000 SOUTHKOREA 2001 SOUTHKOREA 2002 SOUTHKOREA 2021 CZECH 1989 CZECH 1997 CZECH 1998 CZECH 1999 CZECH 2000 CZECH 2001 CZECH 2002 CZECH 2003 CZECH 2004 CZECH 2005 CZECH 2006 CZECH 2021 ; RUN; PROC SQL NOPRINT; SELECT COUNT(DISTINCT GeoGn) INTO : Nations_Cnt FROM Law_Years; QUIT; %PUT NOTE: &=Nations_Cnt; %MACRO Set_PT(GeoGn, StartYr, EndYr, DSN_Nbr); DATA gen_post_treat_&DSN_Nbr; set matching; yr=input(year, ?? 32.); /*year is a character variable, so I convert to numeric variable*/ IF STRIP(UPCASE(GeoGn)) = "&GeoGn" AND 1990 <= Yr <= 2020; IF &StartYr <= Yr <= &EndYr THEN PT = 1; ELSE PT = 0; Cvt_Yr = Yr - &StartYr; RUN; %MEND Set_PT; %MACRO Create_Datasets; %LOCAL i; %DO i = 1 %TO &Nations_Cnt; %Set_PT(&&GeoGn&i, &&LawYear&i, 2020, &i); %END; %MEND Create_Datasets; %Create_Datasets; %MACRO List_Dataset_Names; %LOCAL i; %DO i = 1 %TO &Nations_Cnt; Gen_Post_Treat_%SYSFUNC(STRIP(&i)) %END; %MEND List_Dataset_Names; DATA Gen_Post_Treat; SET %List_Dataset_Names; ; RUN;

Jim

View solution in original post

3 REPLIES 3
jimbarbour
Meteorite | Level 14

@Phil_NZ ,

 

There are many ways to do this.  However, when I see the exact same process being done over and over but each time to a different entity, my mind immediately goes to macro processing.  Just as you were, I'm going to use a Data step to set PT and Cvt_Yr, but I'm going to put the Data step inside a macro wrapper and pass parameters to the macro.  Code once, parameterize, pass in parameters.  This actually minimizes code in the long run although if it's a "one and done" process that you'll never see again, it may not be worth it.

 

My code is below.  Comments: 

First I build a little data  set called Law_Years.  This data drives the overall process.  The nice thing about this code is that if you get additional countries, you add them to the Law_Years table and you do not have to change any code.  As I read each record, I load a macro array containing the GeoGn and the Law_Year.

Second, I read in some test data that I just typed up based on your original post.

Third, I get a count of the number of distinct nations and put it into a macro variable.  I could have gotten the count in step one, but this  step eliminates the possibility of duplicates.

Fourth, I have a macro with the Data step that will be setting PT and Cvt_Yr.  Passed in as parameters are the GeoGn, the StartYr, the EndYr, and the number corresponding to a given GeoGn's position in the macro array.  An intermediate dataset will be created using this number.

Fifth, I run the Data step using a second macro.  All the second macro does is iterate through the macro array, format the parameters, and invoke the first macro.

Sixth, I have a little macro that all it does is list out the names of the intermediate data sets.

Seventh and last, I have a step that combines all the intermediate datasets into one.

 

Give it a try with your data and see how it goes.

DATA	Law_Years;
DROP _:; LENGTH GeoGn $12; LENGTH LawYear $4; INPUT GeoGn $ LawYear ; _Count + 1; CALL SYMPUTX(CATS('GeoGn', PUT(_Count, 3.)), GeoGn); CALL SYMPUTX(CATS('LawYear', PUT(_Count, 3.)), LawYear); DATALINES; UNITEDS 1993 SOUTHKOREA 1997 BRAZIL 2000 CANADA 2000 CZECH 2001 LATVIA 2004 VIETNAM 2018 ; RUN; DATA Matching; LENGTH GeoGn $12; LENGTH Year $4; INPUT GeoGn $ Year $ ; DATALINES; SOUTHKOREA 1989 SOUTHKOREA 1993 SOUTHKOREA 1994 SOUTHKOREA 1995 SOUTHKOREA 1996 SOUTHKOREA 1997 SOUTHKOREA 1998 SOUTHKOREA 1999 SOUTHKOREA 2000 SOUTHKOREA 2001 SOUTHKOREA 2002 SOUTHKOREA 2021 CZECH 1989 CZECH 1997 CZECH 1998 CZECH 1999 CZECH 2000 CZECH 2001 CZECH 2002 CZECH 2003 CZECH 2004 CZECH 2005 CZECH 2006 CZECH 2021 ; RUN; PROC SQL NOPRINT; SELECT COUNT(DISTINCT GeoGn) INTO : Nations_Cnt FROM Law_Years; QUIT; %PUT NOTE: &=Nations_Cnt; %MACRO Set_PT(GeoGn, StartYr, EndYr, DSN_Nbr); DATA gen_post_treat_&DSN_Nbr; set matching; yr=input(year, ?? 32.); /*year is a character variable, so I convert to numeric variable*/ IF STRIP(UPCASE(GeoGn)) = "&GeoGn" AND 1990 <= Yr <= 2020; IF &StartYr <= Yr <= &EndYr THEN PT = 1; ELSE PT = 0; Cvt_Yr = Yr - &StartYr; RUN; %MEND Set_PT; %MACRO Create_Datasets; %LOCAL i; %DO i = 1 %TO &Nations_Cnt; %Set_PT(&&GeoGn&i, &&LawYear&i, 2020, &i); %END; %MEND Create_Datasets; %Create_Datasets; %MACRO List_Dataset_Names; %LOCAL i; %DO i = 1 %TO &Nations_Cnt; Gen_Post_Treat_%SYSFUNC(STRIP(&i)) %END; %MEND List_Dataset_Names; DATA Gen_Post_Treat; SET %List_Dataset_Names; ; RUN;

Jim

Phil_NZ
Barite | Level 11

Hi @jimbarbour 

I have a couple of questions in advance here, your code is quite a bit beyond my knowledge so I need to print it out and read it carefully.

 

1. What does the fourth dimension in your macro Set_PT mean? I did not see you refer it previously.

2. What is the purpose of creating the dataset "Matching" with not continuous years like that? Or as you said, you just used the number already from my post? Ahh, or do you mean "matching" is my original dataset? So, does it mean that, in my case, I can create a dataset Nations_Cnt directly by using PROC SQL from dataset matching without creating the dataset Law_Years like your?

 

3. whether the code 

%PUT  NOTE:  &=Nations_Cnt;

just to write down to the log that we are heading to a specific country?

4. And I believe that, you write some abbreviation with purpose, can I ask what do "DSN_Nrb" and "Nations_Cnt" stand for, that I can explain these words and the situation easier?

 

Thank you so much

Phil.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
jimbarbour
Meteorite | Level 14

@Phil_NZ wrote:

1. What does the fourth dimension in your macro Set_PT mean? I did not see you refer it previously.


@Phil_NZ , the macro processes one country at a time.  The Data step writes out a file named, "gen_post_treat."  However, if I use the same filename for every country, the 2nd country will overwrite the first, the 3rd the 2nd, and so on.  So what I do is pass in the fourth parameter, &i.  The macro variable &i is just a number, a counter, that starts at one and goes up to however many countries there are in your Law_Year table.  This number is suffixed onto the file name so that an intermediate file is created with a unique name for each country.

 


@Phil_NZ wrote:

2. What is the purpose of creating the dataset "Matching" with not continuous years like that? Or as you said, you just used the number already from my post?


Well, mostly I took the examples from your original post.  I added 1989 and 2021 to test years that fall out of the range of consideration.  Basically, anything that is before 1990 or after 2020, we want to exclude.  By putting 1989 and 2021 into the data, I test my range.  If my code filters out 1989 and 2021, then I have validated that the code is working properly.

 


@Phil_NZ wrote:

3. whether the code 

%PUT  NOTE:  &=Nations_Cnt;

just to write down to the log that we are heading to a specific country?


%PUT does indeed write to the log.  The %PUT immediately follows a Proc SQL step that creates a macro variable.  I want to know if the value of the macro variable is correct, so I write it to the log.  In the log, it looks like the below.  Nations count is simple a count of how many nations there are in the Law_Year table.  In this case, there are seven (US, S Korea, Brazil, Canada, Czech Republic, Latvia, and Vietnam).

jimbarbour_0-1623545252760.png

 


@Phil_NZ wrote:

4. And I believe that, you write some abbreviation with purpose, can I ask what do "DSN_Nrb" and "Nations_Cnt" stand for, that I can explain these words and the situation easier?


DSN_Nbr is just a number from one to the number of nations.  Nations_Cnt is a count of the number of nations in the Law_Year table.  You gave 7 examples in your original post.  Elsewhere you mentioned that there are I believe 64.  "DSN" stands for Data Set Name.  Therefore DSN_Nbr is the number that is suffixed on to the back end of the data set name to make each data set name unique.

Here, below, using the examples from your original post, are the intermediate data sets I created, one for each country.  Notice how there is a number on the back end of each dataset name.  The number on the back of each dataset name is the dataset name number or DSN_Nbr.

jimbarbour_1-1623545664741.png

 

I hope that helps, but this can't be easy code if someone is new to SAS.  However, you did ask for code that would make the job of working with 64 nations easier, so...  🙂

 

In the Data step, I'm excluding all the data that is not applicable and setting all the values doing everything in just three commands: Two IF statements and one mathematical expression.  I can't make it much more concise than that.  Three commands for 64 nations is fairly reasonable I think.

 

Jim

 

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 3 replies
  • 606 views
  • 3 likes
  • 2 in conversation