BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ASHISH2525
Quartz | Level 8

Hi All,

 

I need to update one columns value based on other rows specific value.

in below data set i want to update Mvalue as "Progress" for all the rows of a type if any single row contains "Progress" in Mvalue.

eg: if type C contains Mvalue as "Progress"  then for all the rows of type "C" should be tagged as "Progress" in Mvalue.

 

data one;

input type $  Mvalue $12.;


datalines;

A closed

B Progress

C Closed

C Progress

C Closed

E closed

E Closed

E Progress

;

run;

 

Regards,

Ashish

1 ACCEPTED SOLUTION

Accepted Solutions
FreelanceReinh
Jade | Level 19

Hi @ASHISH2525,

 

One way to achieve this is a double "DOW loop":

data want(drop=flag);
do until(last.type);
  set one;
  by type;
  if mvalue='Progress' then flag=1;
end;
do until(last.type);
  set one;
  by type;
  if flag then mvalue='Progress';
  output;
end;
run;

View solution in original post

9 REPLIES 9
FreelanceReinh
Jade | Level 19

Hi @ASHISH2525,

 

One way to achieve this is a double "DOW loop":

data want(drop=flag);
do until(last.type);
  set one;
  by type;
  if mvalue='Progress' then flag=1;
end;
do until(last.type);
  set one;
  by type;
  if flag then mvalue='Progress';
  output;
end;
run;
PeterClemmensen
Tourmaline | Level 20

@FreelanceReinh beat me by 30 sec and a few lines 😉

FreelanceReinh
Jade | Level 19

@PeterClemmensen wrote:

@FreelanceReinh beat me by 30 sec and a few lines 😉


It's not easy to beat someone as fast as you. 🙂

 

@ASHISH2525: So, this solution has been confirmed independently. Other possible approaches include PROC SQL (using group by type and max(mvalue='Progress'), but with the drawback that the order of observations within a TYPE value is not maintained) and a "self merge" DATA step (with the drawback that the ugly note "MERGE statement has more than one data set with repeats of BY values" may appear in the log, which could be avoided with a preliminary step, though).

PeterClemmensen
Tourmaline | Level 20

One way

 

data one;
input type $  Mvalue $12.;
datalines;
A closed
B Progress
C Closed
C Progress
C Closed
E closed
E Closed
E Progress
;
run;

data two;
   flag=0;
   do until (last.type);
      set one;
      by type;
      if Mvalue='Progress' then flag=1;
   end;
   do until (last.type);
      set one;
      by type;
      if flag=1 then Mvalue='Progress';
      output;
   end;
run;
Ksharp
Super User
data one;
input type $  Mvalue $12.;
datalines;
A closed
B Progress
C Closed
C Progress
C Closed
E closed
E Closed
E Progress
;
data want;
 merge one one(keep=type  Mvalue where=(_Mvalue='Progress') rename=(Mvalue=_Mvalue));
 by type;
 want_Mvalue=coalescec(_Mvalue,Mvalue);
 drop _Mvalue Mvalue;
run;
FreelanceReinh
Jade | Level 19

Yes, this is the "self merge" approach, my version of which was this:

 

data want(drop=_m);
merge one(rename=(mvalue=_m))
      one(where=(mvalue='Progress') in=p);
by type;
if ~p then mvalue=_m;
run;

As mentioned in my previous post, it has the disadvantage of potentially producing that unwanted note about "more than one data set with repeats of BY values" if there is a BY group containing more than one observation with mvalue='Progress' in dataset ONE. But it works properly in spite of that note.

 

Ksharp
Super User

OK. @FreelanceReinh  ,

I just want put it further for fun.

 

data one;
input type $  Mvalue $12.;
datalines;
A closed
B Progress
C Closed
C Progress
C Closed
E closed
E Closed
E Progress
;
proc sql;
create table want as
select 	*,case when sum(Mvalue='Progress') then  'Progress' else Mvalue end as Want_Mvalue
 from one 
  group by type
;
quit;
FreelanceReinh
Jade | Level 19

Now that we have "all" three major approaches on the table, it would be interesting to see how they perform on a larger input dataset. So, let's create one:

 

/* Create a larger test dataset */

data one(drop=i j);
length type $8 mvalue $12;
call streaminit(27182818);
do i=1 to 1e7;
  type=put(i, hex8.);
  do j=1 to rand('integer',10);
    mvalue=choosec(rand('integer',4),'closed','Progress','Closed','other');
    output;
  end;
end;
run; /* 54995649 obs. */

The log under the spoiler below shows the run times ("real time") on my workstation:

 

  1. DOW loop approach:    8.19 s
  2. Self merge approach:  9.83 s
  3. PROC SQL approach:   17.15 s

The above ranking was confirmed in a second run (8.57 s, 9.82 s, 16.47 s) and in a third run with the order of the three steps reversed (8.65 s, 9.73 s, 16.74 s).

 

To be fair, it should be noted that the PROC SQL approach does not require a sorted (or indexed) input dataset (by type), unlike the other two approaches. With a randomly sorted version of input dataset ONE it took about 32 - 33 s.

 

Spoiler
1    data one(drop=i j);
2    length type $8 mvalue $12;
3    call streaminit(27182818);
4    do i=1 to 1e7;
5      type=put(i, hex8.);
6      do j=1 to rand('integer',10);
7        mvalue=choosec(rand('integer',4),'closed','Progress','Closed','other');
8        output;
9      end;
10   end;
11   run;

NOTE: The data set WORK.ONE has 54995649 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           5.29 seconds
      cpu time            5.28 seconds


12
13   /* DOW loop approach */
14
15   data want1(drop=flag);
16   do until(last.type);
17     set one;
18     by type;
19     if mvalue='Progress' then flag=1;
20   end;
21   do until(last.type);
22     set one;
23     by type;
24     if flag then mvalue='Progress';
25     output;
26   end;
27   run;

NOTE: There were 54995649 observations read from the data set WORK.ONE.
NOTE: There were 54995649 observations read from the data set WORK.ONE.
NOTE: The data set WORK.WANT1 has 54995649 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           8.19 seconds
      cpu time            8.20 seconds


28
29   /* Self merge approach */
30
31   data want2(drop=_m);
32   merge one(rename=(mvalue=_m))
33         one(where=(mvalue='Progress') in=p);
34   by type;
35   if ~p then mvalue=_m;
36   run;

NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 54995649 observations read from the data set WORK.ONE.
NOTE: There were 13752908 observations read from the data set WORK.ONE.
      WHERE mvalue='Progress';
NOTE: The data set WORK.WANT2 has 54995649 observations and 2 variables.
NOTE: DATA statement used (Total process time):
      real time           9.83 seconds
      cpu time            9.82 seconds


37
38   /* PROC SQL approach */
39
40   proc sql;
41   create table want3 as
42   select type, case when max(mvalue='Progress')
43                  then 'Progress'
44                  else mvalue
45                end as mvalue
46   from one
47   group by type;
NOTE: The query requires remerging summary statistics back with the original data.
NOTE: SAS threaded sort was used.
NOTE: Table WORK.WANT3 created, with 54995649 rows and 2 columns.

48   quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           17.15 seconds
      cpu time            28.00 seconds

 

 

PeterClemmensen
Tourmaline | Level 20

@FreelanceReinh cool stuff 🙂

 

Here are my results on the large data. Just for fun I added a Hash Object approach

 

data one(drop=i j);
length type $8 mvalue $12;
call streaminit(27182818);
do i=1 to 1e7;
  type=put(i, hex8.);
  do j=1 to rand('integer',10);
    mvalue=choosec(rand('integer',4),'closed','Progress','Closed','other');
    output;
  end;
end;
run; /* 54995649 obs. */

/* 10.72 sec */
data want1(drop=flag);
do until(last.type);
  set one;
  by type;
  if mvalue='Progress' then flag=1;
end;
do until(last.type);
  set one;
  by type;
  if flag then mvalue='Progress';
  output;
end;
run;

/* 11.8 sec */
data want2(drop=_m);
merge one(rename=(mvalue=_m))
      one(where=(mvalue='Progress') in=p);
by type;
if ~p then mvalue=_m;
run;

/* 37.3 sec */
proc sql;
create table want3 as
select type, case when max(mvalue='Progress')
               then 'Progress'
               else mvalue
             end as mvalue
from one
group by type;
quit;

/* 14.2 sec */
data want4;
   if _N_ = 1 then do;
      declare hash h(dataset:"one(where=(Mvalue='Progress'))", hashexp:20);
      h.defineKey('type');
      h.defineDone();
   end;

   set one;
   if h.check()=0 then Mvalue='Progress';
run;

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 4881 views
  • 11 likes
  • 4 in conversation