BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
TrueTears
Obsidian | Level 7

I have a dataset similar to the following (but with more years and firm_id's)

 

data have;
input year firm_id city_id operate action;
cards;
2008 28013 1003 1 1
2008 28013 1004 1 1
2008 28013 1007 1 0
2008 28013 1009 1 1
2008 28013 1010 0 1
2008 28013 1013 1 0
2008 28013 1053 1 1
2008 28013 1074 1 0
2008 28013 1075 1 1
;
run;

 

I also have a list of each city_id and its adjacent city_id (city_id_adj) and a binary variable called "use":

 

data list;
input city_id city_id_adj use;
cards;
1003 1009 1
1003 1010 1
1003 1085 1
1004 1007 0
1004 1009 0
1004 1099 0
1004 1100 0
1007 1004 0
1007 1823 1
1009 1003 1
1009 1004 0
1009 1010 0
1010 1003 1
1010 1009 0
1013 1053 1
1013 1074 1
1053 1013 1
1074 1013 1
1075 1743 1
1075 1744 1
;
run;

 

I wish to produce the following dataset (I have more years and more firm_id's, so the code should work for multiple years and firm_id's)

 

data want;
input year firm_id city_id operate action action_adj;
cards;
2008 28013 1003 1 1 2
2008 28013 1004 1 1 0
2008 28013 1007 1 0 0
2008 28013 1009 1 1 1
2008 28013 1010 0 1 3
2008 28013 1013 1 0 1
2008 28013 1053 1 1 0
2008 28013 1074 1 0 0
2008 28013 1075 1 1 0
;
run;

To produce the above dataset, the rule for producing the variable action_adj is as follows. Fix the year and firm_id:


If operate = 0, then action_adj = 3. An example is city_id 1010.


For operate = 1:


For each city_id, we check its corresponding city_id_adj in dataset "list" that has use=1. If in this set of city_id_adj, there is one that has operate = 0 in the dataset "have", then action_adj = 2. For example, city_id 1003 corresponds to 1009, 1010, and 1085 (all of these have use=1). Here 1010 has operate = 0, so action_adj = 2 for city_id 1003.


If the list of city_id_adj does not have any that has operate=0, then we do the following:

  • action_adj = 1 if there is at least one city_id_adj (with use=1) that has action = 1 in the dataset "have". For example, consider city_id 1009, in the dataset "list", it corresponds to city_id_adj 1003 only (since this is the only one with use = 1). In the dataset "have", 1003 has action=1, so we set action_adj = 1 for 1009. Another example is city_id 1013. In the dataset "list", 1013 corresponds to 1053 and 1074 (both have use=1). Although 1074 has action=0, but 1053 has action=1, so we set action_adj = 1 for 1013.
  • action_adj = 0 for all other cases.

Note all of the above is for a particular year and particular firm_id. This needs to be done for all firm_id and year pairs.

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@TrueTears 

Ideally you try and understand the code to a level where you can make such a change yourself. But here you go.


data want(keep=year firm_id city_id operate action action_adj);
  if _n_=1 then
    do;
      if 0 then set list;
      dcl hash list(dataset:'list(where=(use=1))', multidata:'y');
      list.defineKey('city_id');
      list.defineData('city_id_adj');
      list.defineDone();

      dcl hash Operate0(dataset:'have(keep=year city_id operate where=(operate=0))');
      Operate0.defineKey('year','city_id');
      Operate0.defineDone();

      dcl hash Action1(dataset:'have(keep=year city_id action where=(action=1))');
      Action1.defineKey('year','city_id');
      Action1.defineDone();
    end;
  call missing(of _all_);

  set have;

  /* If operate = 0, then action_adj = 3 */
  if operate = 0 then 
    do;
      action_adj = 3;
      output;
      return;
    end;

  /* For each city_id in have, we check its corresponding city_id_adj in dataset "list" that has use=1 */
  if list.check()=0 then
    do;
      /* If in this set of city_id_adj, there is one that has operate = 0 in the dataset "have", then action_adj = 2 */
      list.reset_dup();
      do while(list.do_over()=0);
        if Operate0.check(key:year, key:city_id_adj)=0 then
          do;
            action_adj = 2;
            output;
            return;
          end;
      end;

      /* action_adj = 1 if there is at least one city_id_adj (with use=1) that has action = 1 in the dataset "have". */
      list.reset_dup();
      do while(list.do_over()=0);
        if Action1.check(key:year, key:city_id_adj)=0 then
          do;
            action_adj = 1;
            output;
            return;
          end;
      end;

    end;

  /* action_adj = 0 for all other cases */
  action_adj = 0;
  output;
  return;

run;

View solution in original post

6 REPLIES 6
Jagadishkatam
Amethyst | Level 16

Please try the below code

 

data have;
input year firm_id city_id operate action;
cards;
2008 28013 1003 1 1
2008 28013 1004 1 1
2008 28013 1007 1 0
2008 28013 1009 1 1
2008 28013 1010 0 1
2008 28013 1013 1 0
2008 28013 1053 1 1
2008 28013 1074 1 0
2008 28013 1075 1 1
;
run;


data list;
input city_id city_id_adj use;
cards;
1003 1009 1
1003 1010 1
1003 1085 1
1004 1007 0
1004 1009 0
1004 1099 0
1004 1100 0
1007 1004 0
1007 1823 1
1009 1003 1
1009 1004 0
1009 1010 0
1010 1003 1
1010 1009 0
1013 1053 1
1013 1074 1
1053 1013 1
1074 1013 1
1075 1743 1
1075 1744 1
;
run;


proc sort data=have;
by city_id;
run;

proc sort data=list;
by city_id;
run;

proc sort data=have out=operate;
by city_id;
where operate=0;
run;

proc sort data=have out=action;
by city_id;
where action=1;
run;

data want;
merge have(in=a) list(in=b)  ;
by city_id;
if a and b;
if operate=0 then action_adj=3;
run;

proc sort data=want;
by city_id_adj;
run;

data want2;
merge want(in=a) operate(in=v keep=city_id operate rename=(city_id=city_id_adj operate=operate2)) action(keep=city_id action rename=(city_id=city_id_adj action=action2));
by city_id_adj;
if a ;
if use=1 and operate=1 and operate2=0 then action_adj=2;
if use=1 and action2=1 and operate2^=0  then action_adj=1;
if action_adj=. then action_adj=0;
run;

proc sort data=want2;
by city_id action_adj;
run;

data want3;
set want2;
by city_id action_adj;
if last.city_id;
run;
Thanks,
Jag
Patrick
Opal | Level 21

@TrueTears 

Thank you for posting sample data as working data steps and also for taking your time and explain the logic you need with samples and desired result. 

Below some code which produces the desired result.

data have;
  input year firm_id city_id operate action;
  cards;
2008 28013 1003 1 1
2008 28013 1004 1 1
2008 28013 1007 1 0
2008 28013 1009 1 1
2008 28013 1010 0 1
2008 28013 1013 1 0
2008 28013 1053 1 1
2008 28013 1074 1 0
2008 28013 1075 1 1
;

data list;
  input city_id city_id_adj use;
  cards;
1003 1009 1
1003 1010 1
1003 1085 1
1004 1007 0
1004 1009 0
1004 1099 0
1004 1100 0
1007 1004 0
1007 1823 1
1009 1003 1
1009 1004 0
1009 1010 0
1010 1003 1
1010 1009 0
1013 1053 1
1013 1074 1
1053 1013 1
1074 1013 1
1075 1743 1
1075 1744 1
;

data want(keep=year firm_id city_id operate action action_adj);
  if _n_=1 then
    do;
      if 0 then set list;
      dcl hash list(dataset:'list(where=(use=1))', multidata:'y');
      list.defineKey('city_id');
      list.defineData('city_id_adj');
      list.defineDone();

      dcl hash Operate0(dataset:'have(keep=city_id operate where=(operate=0))');
      Operate0.defineKey('city_id');
      Operate0.defineDone();

      dcl hash Action1(dataset:'have(keep=city_id action where=(action=1))');
      Action1.defineKey('city_id');
      Action1.defineDone();
    end;
  call missing(of _all_);

  set have;

  /* If operate = 0, then action_adj = 3 */
  if operate = 0 then 
    do;
      action_adj = 3;
      output;
      return;
    end;

  /* For each city_id in have, we check its corresponding city_id_adj in dataset "list" that has use=1 */
  if list.check()=0 then
    do;
      /* If in this set of city_id_adj, there is one that has operate = 0 in the dataset "have", then action_adj = 2 */
      list.reset_dup();
      do while(list.do_over()=0);
        if Operate0.check(key:city_id_adj)=0 then
          do;
            action_adj = 2;
            output;
            return;
          end;
      end;

      /* action_adj = 1 if there is at least one city_id_adj (with use=1) that has action = 1 in the dataset "have". */
      list.reset_dup();
      do while(list.do_over()=0);
        if Action1.check(key:city_id_adj)=0 then
          do;
            action_adj = 1;
            output;
            return;
          end;
      end;

    end;

  /* action_adj = 0 for all other cases */
  action_adj = 0;
  output;
  return;

run;

proc print;
run;

TrueTears
Obsidian | Level 7

@Patrick

 

Thank you! I tried your code, it seems to work for one particular year and firm_id pair but does not seem to work for multiple. Sorry that I did not add additional year and firm_id in my original post. Consider the following extended "have" dataset:

 

data have;
  input year firm_id city_id operate action;
  cards;
2008 28013 1003 1 1
2008 28013 1004 1 1
2008 28013 1007 1 0
2008 28013 1009 1 1
2008 28013 1010 0 1
2008 28013 1013 1 0
2008 28013 1053 1 1
2008 28013 1074 1 0
2008 28013 1075 1 1
2009 28332 1003 1 0
2009 28332 1010 1 0
2009 28332 1013 0 1
2009 28332 1053 1 0
2009 28332 1074 1 1
2009 28332 1075 1 1

;

I added in additional year 2009 and another firm_id 28332. The "list" dataset always remains the same. Now "want" becomes:

 

data want;
input year firm_id city_id operate action action_adj;
cards;
2008 28013 1003 1 1 2
2008 28013 1004 1 1 0
2008 28013 1007 1 0 0
2008 28013 1009 1 1 1
2008 28013 1010 0 1 3
2008 28013 1013 1 0 1
2008 28013 1053 1 1 0
2008 28013 1074 1 0 0
2008 28013 1075 1 1 0
2009 28332 1003 1 0 0
2009 28332 1010 1 0 0
2009 28332 1013 0 1 3
2009 28332 1053 1 0 2
2009 28332 1074 1 1 2
2009 28332 1075 1 1 0

;
run;

However, your code gives the following (which is almost correct):

 

data want;
input year firm_id city_id operate action action_adj;
cards;
2008 28013 1003 1 1 2
2008 28013 1004 1 1 0
2008 28013 1007 1 0 0
2008 28013 1009 1 1 1
2008 28013 1010 0 1 3
2008 28013 1013 1 0 1
2008 28013 1053 1 1 0
2008 28013 1074 1 0 0
2008 28013 1075 1 1 0
2009 28332 1003 1 0 2
2009 28332 1010 1 0 1
2009 28332 1013 0 1 3
2009 28332 1053 1 0 2
2009 28332 1074 1 1 2
2009 28332 1075 1 1 0

;
run;

The first difference is for year 2009, firm_id 28332, and city_id 1003 which should have action_adj = 0 rather than action_adj = 2. This is because in the list, city_id 1003 corresponds to 1009, 1010, and 1085, but although 1010 has operate=0 for the year/firm_id pair 2008/28013, it has operate=1 for the year/firm_id pair 2009/28332.

 

The second difference is for year 2009, firm_id 28332, and city_id 1010 which should have action_adj=0 rather than action_adj=1. This is because city_id 1010 corresponds to 1003 and although 1003 has action=1 for year/firm_id 2008/28013, it has action=0 for year/firm_id 2009/28332.

 

Could you slightly alter your code so the conditions that I outlined in my original post is applied for every year/firm_id pair and not for the entire dataset?

 

Thank you so very much.

TrueTears
Obsidian | Level 7

@Patrick


Thank you! I tried your code, it seems to work for one particular year and firm_id pair but does not seem to work for multiple. Sorry that I did not add additional year and firm_id in my original post. Consider the following extended "have" dataset:


data have;
input year firm_id city_id operate action;
cards;
2008 28013 1003 1 1
2008 28013 1004 1 1
2008 28013 1007 1 0
2008 28013 1009 1 1
2008 28013 1010 0 1
2008 28013 1013 1 0
2008 28013 1053 1 1
2008 28013 1074 1 0
2008 28013 1075 1 1
2009 28332 1003 1 0
2009 28332 1010 1 0
2009 28332 1013 0 1
2009 28332 1053 1 0
2009 28332 1074 1 1
2009 28332 1075 1 1

;

I added in additional year 2009 and another firm_id 28332. The "list" dataset always remains the same. Now "want" becomes:


data want;
input year firm_id city_id operate action action_adj;
cards;
2008 28013 1003 1 1 2
2008 28013 1004 1 1 0
2008 28013 1007 1 0 0
2008 28013 1009 1 1 1
2008 28013 1010 0 1 3
2008 28013 1013 1 0 1
2008 28013 1053 1 1 0
2008 28013 1074 1 0 0
2008 28013 1075 1 1 0
2009 28332 1003 1 0 0
2009 28332 1010 1 0 0
2009 28332 1013 0 1 3
2009 28332 1053 1 0 2
2009 28332 1074 1 1 2
2009 28332 1075 1 1 0

;
run;

However, your code gives the following (which is almost correct):


data want;
input year firm_id city_id operate action action_adj;
cards;
2008 28013 1003 1 1 2
2008 28013 1004 1 1 0
2008 28013 1007 1 0 0
2008 28013 1009 1 1 1
2008 28013 1010 0 1 3
2008 28013 1013 1 0 1
2008 28013 1053 1 1 0
2008 28013 1074 1 0 0
2008 28013 1075 1 1 0
2009 28332 1003 1 0 2
2009 28332 1010 1 0 1
2009 28332 1013 0 1 3
2009 28332 1053 1 0 2
2009 28332 1074 1 1 2
2009 28332 1075 1 1 0

;
run;

The first difference is for year 2009, firm_id 28332, and city_id 1003 which should have action_adj = 0 rather than action_adj = 2. This is because in the list, city_id 1003 corresponds to 1009, 1010, and 1085, but although 1010 has operate=0 for the year/firm_id pair 2008/28013, it has operate=1 for the year/firm_id pair 2009/28332.


The second difference is for year 2009, firm_id 28332, and city_id 1010 which should have action_adj=0 rather than action_adj=1. This is because city_id 1010 corresponds to 1003 and although 1003 has action=1 for year/firm_id 2008/28013, it has action=0 for year/firm_id 2009/28332.


Could you slightly alter your code so the conditions that I outlined in my original post is applied for every year/firm_id pair and not for the entire dataset?


Thank you so very much.

TrueTears
Obsidian | Level 7

@Patrick Thank you! I tried your code, it seems to work for one particular year and firm_id pair but does not seem to work for multiple. Sorry that I did not add additional year and firm_id in my original post. Consider the following extended "have" dataset:

 

data have;
  input year firm_id city_id operate action;
  cards;
2008 28013 1003 1 1
2008 28013 1004 1 1
2008 28013 1007 1 0
2008 28013 1009 1 1
2008 28013 1010 0 1
2008 28013 1013 1 0
2008 28013 1053 1 1
2008 28013 1074 1 0
2008 28013 1075 1 1
2009 28332 1003 1 0
2009 28332 1010 1 0
2009 28332 1013 0 1
2009 28332 1053 1 0
2009 28332 1074 1 1
2009 28332 1075 1 1

;
run;

I added in additional year 2009 and another firm_id 28332. The "list" dataset always remains the same. Now "want" becomes:

 


data want;
input year firm_id city_id operate action action_adj;
cards;
2008 28013 1003 1 1 2
2008 28013 1004 1 1 0
2008 28013 1007 1 0 0
2008 28013 1009 1 1 1
2008 28013 1010 0 1 3
2008 28013 1013 1 0 1
2008 28013 1053 1 1 0
2008 28013 1074 1 0 0
2008 28013 1075 1 1 0
2009 28332 1003 1 0 0
2009 28332 1010 1 0 0
2009 28332 1013 0 1 3
2009 28332 1053 1 0 2
2009 28332 1074 1 1 2
2009 28332 1075 1 1 0

;
run;

However, your code gives the following (which is almost correct):


data want;
input year firm_id city_id operate action action_adj;
cards;
2008 28013 1003 1 1 2
2008 28013 1004 1 1 0
2008 28013 1007 1 0 0
2008 28013 1009 1 1 1
2008 28013 1010 0 1 3
2008 28013 1013 1 0 1
2008 28013 1053 1 1 0
2008 28013 1074 1 0 0
2008 28013 1075 1 1 0
2009 28332 1003 1 0 2
2009 28332 1010 1 0 1
2009 28332 1013 0 1 3
2009 28332 1053 1 0 2
2009 28332 1074 1 1 2
2009 28332 1075 1 1 0

;
run;

The first difference is for year 2009, firm_id 28332, and city_id 1003 which should have action_adj = 0 rather than action_adj = 2. This is because in the list, city_id 1003 corresponds to 1009, 1010, and 1085, but although 1010 has operate=0 for the year/firm_id pair 2008/28013, it has operate=1 for the year/firm_id pair 2009/28332.


The second difference is for year 2009, firm_id 28332, and city_id 1010 which should have action_adj=0 rather than action_adj=1. This is because city_id 1010 corresponds to 1003 and although 1003 has action=1 for year/firm_id 2008/28013, it has action=0 for year/firm_id 2009/28332.


Could you slightly alter your code so the conditions that I outlined in my original post is applied for every year/firm_id pair and not for the entire dataset?


Thank you so very much.

Patrick
Opal | Level 21

@TrueTears 

Ideally you try and understand the code to a level where you can make such a change yourself. But here you go.


data want(keep=year firm_id city_id operate action action_adj);
  if _n_=1 then
    do;
      if 0 then set list;
      dcl hash list(dataset:'list(where=(use=1))', multidata:'y');
      list.defineKey('city_id');
      list.defineData('city_id_adj');
      list.defineDone();

      dcl hash Operate0(dataset:'have(keep=year city_id operate where=(operate=0))');
      Operate0.defineKey('year','city_id');
      Operate0.defineDone();

      dcl hash Action1(dataset:'have(keep=year city_id action where=(action=1))');
      Action1.defineKey('year','city_id');
      Action1.defineDone();
    end;
  call missing(of _all_);

  set have;

  /* If operate = 0, then action_adj = 3 */
  if operate = 0 then 
    do;
      action_adj = 3;
      output;
      return;
    end;

  /* For each city_id in have, we check its corresponding city_id_adj in dataset "list" that has use=1 */
  if list.check()=0 then
    do;
      /* If in this set of city_id_adj, there is one that has operate = 0 in the dataset "have", then action_adj = 2 */
      list.reset_dup();
      do while(list.do_over()=0);
        if Operate0.check(key:year, key:city_id_adj)=0 then
          do;
            action_adj = 2;
            output;
            return;
          end;
      end;

      /* action_adj = 1 if there is at least one city_id_adj (with use=1) that has action = 1 in the dataset "have". */
      list.reset_dup();
      do while(list.do_over()=0);
        if Action1.check(key:year, key:city_id_adj)=0 then
          do;
            action_adj = 1;
            output;
            return;
          end;
      end;

    end;

  /* action_adj = 0 for all other cases */
  action_adj = 0;
  output;
  return;

run;

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 6 replies
  • 1405 views
  • 1 like
  • 3 in conversation