hi,
can someone recommend some code so that sas check for condition: if value IDs is duplicate [could be more than 1 duplicate], then scan Var1 for content, if no content, create new_dummy_var=1. If value IDs not duplicate, skip.
Var1 is numeric.
UPDATING Example outcome:
IDs Var 1 new_dummy_var
A001 5
A001 6
A001 5
A002 1
A002 1
A003 5
A003
A004
**Update: In the example outcome, note that because one of the duplicate observation A003 has a 5 in Var1, the program should move on, and not assign a value in New_dummy_var.
**New_dummy_var is only meant to be a flag, so not care if char or numeric.
Thank you very much
Dave
This can also be done in a data step:
data want (drop=_:);
  do until (last.IDs);
    set have;
    by IDs;
    if first.IDs then do;
      _Vars=0;
      _Recs=0;
    end;
    _Recs+1;
    if not missing(var) then _Vars+1;
  end;
  do until (last.IDs);
    set have;
    by IDs;
    if _Recs GT 1 and _Vars=0 then new_dummy_var=1;
    output;
  end;
run;
Art, CEO, AnalystFinder.com
Expand on the previous solution @brulard
data t;
input ID$ Var1$;
cards;
a001 A
a001 A
a001 A
a002 .
a002 .
a003 .
a004 .
;
run;
data t2;
set t;
by id;
new_dummy_var= ^(first.ID and last.ID) and var1='';
run;
Edit: I created Var1 as a character variable, but if yours is numeric, this statement would work:
new_dummy_var= ^(first.ID and last.ID) and var1=.;
The missing(var1) function that Art297 used below works with either character or numeric data type.
data want;
  set have;
  by ID;
  if not(first.id and last.id) then do;
    if missing(Var1) then new_dummy_var=1;
  end;
run;
Art, CEO, AnalystFinder.com
Just to clarify the question, what should the dummy variable be in this case?
IDs Var 1 new_dummy_var
A001
A001
A001 5
Your answer makes a big difference in the complexity of the programming.
It could be a number or char. thanks
@brulard the question is, if there are multiple rows for any ID, but not all rows have a value in Var1 (some do), then what would you want the new_dummy_var to be? Should new_dummy_var be set to 1 only if there are no values in Var1 at all? Or should it be set to 1 even if there are values in Var1 on some rows, but not on all rows?
hi nehalsanghvi,
if there are multiple rows for any ID, but not all rows have a value in Var1 (some do), then what would you want the new_dummy_var to be?
The new_dummy_var to be set to 1 only if there are no values in Var1 at all. Thanks,
As Astounding said.
data have;
input IDs  $        Var    ;
cards;
A001           5 .
A001           6 .
A001           5    .
A002            .                
A002            .                
A003 . .
A004 . .
;
run;
proc sql;
select *,case when(count(*) ne 1 and sum(missing(var))=count(*)) then 1 else . end as dummy
 from have
  group by ids;
quit;This can also be done in a data step:
data want (drop=_:);
  do until (last.IDs);
    set have;
    by IDs;
    if first.IDs then do;
      _Vars=0;
      _Recs=0;
    end;
    _Recs+1;
    if not missing(var) then _Vars+1;
  end;
  do until (last.IDs);
    set have;
    by IDs;
    if _Recs GT 1 and _Vars=0 then new_dummy_var=1;
    output;
  end;
run;
Art, CEO, AnalystFinder.com
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
