hi,
can someone recommend some code so that sas check for condition: if value IDs is duplicate [could be more than 1 duplicate], then scan Var1 for content, if no content, create new_dummy_var=1. If value IDs not duplicate, skip.
Var1 is numeric.
UPDATING Example outcome:
IDs Var 1 new_dummy_var
A001 5
A001 6
A001 5
A002 1
A002 1
A003 5
A003
A004
**Update: In the example outcome, note that because one of the duplicate observation A003 has a 5 in Var1, the program should move on, and not assign a value in New_dummy_var.
**New_dummy_var is only meant to be a flag, so not care if char or numeric.
Thank you very much
Dave
This can also be done in a data step:
data want (drop=_:); do until (last.IDs); set have; by IDs; if first.IDs then do; _Vars=0; _Recs=0; end; _Recs+1; if not missing(var) then _Vars+1; end; do until (last.IDs); set have; by IDs; if _Recs GT 1 and _Vars=0 then new_dummy_var=1; output; end; run;
Art, CEO, AnalystFinder.com
Expand on the previous solution @brulard
data t;
input ID$ Var1$;
cards;
a001 A
a001 A
a001 A
a002 .
a002 .
a003 .
a004 .
;
run;
data t2;
set t;
by id;
new_dummy_var= ^(first.ID and last.ID) and var1='';
run;
Edit: I created Var1 as a character variable, but if yours is numeric, this statement would work:
new_dummy_var= ^(first.ID and last.ID) and var1=.;
The missing(var1) function that Art297 used below works with either character or numeric data type.
data want; set have; by ID; if not(first.id and last.id) then do; if missing(Var1) then new_dummy_var=1; end; run;
Art, CEO, AnalystFinder.com
Just to clarify the question, what should the dummy variable be in this case?
IDs Var 1 new_dummy_var
A001
A001
A001 5
Your answer makes a big difference in the complexity of the programming.
It could be a number or char. thanks
@brulard the question is, if there are multiple rows for any ID, but not all rows have a value in Var1 (some do), then what would you want the new_dummy_var to be? Should new_dummy_var be set to 1 only if there are no values in Var1 at all? Or should it be set to 1 even if there are values in Var1 on some rows, but not on all rows?
hi nehalsanghvi,
if there are multiple rows for any ID, but not all rows have a value in Var1 (some do), then what would you want the new_dummy_var to be?
The new_dummy_var to be set to 1 only if there are no values in Var1 at all. Thanks,
As Astounding said.
data have;
input IDs $ Var ;
cards;
A001 5 .
A001 6 .
A001 5 .
A002 .
A002 .
A003 . .
A004 . .
;
run;
proc sql;
select *,case when(count(*) ne 1 and sum(missing(var))=count(*)) then 1 else . end as dummy
from have
group by ids;
quit;
This can also be done in a data step:
data want (drop=_:); do until (last.IDs); set have; by IDs; if first.IDs then do; _Vars=0; _Recs=0; end; _Recs+1; if not missing(var) then _Vars+1; end; do until (last.IDs); set have; by IDs; if _Recs GT 1 and _Vars=0 then new_dummy_var=1; output; end; run;
Art, CEO, AnalystFinder.com
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.