I have a data set were few variables are repeating but i only want to keep info for a specific variable
Have:
Name ID Code code_description type
Sarah 0007 A19 DM test mining
Sarah 0007 B45 DM test list
Jack 1111 F56 HPY type 1 mining
Jack 1111 H78 HPY type 1 list
Want
Name ID Code code_description type
.. .. A19 DM test mining
Sarah 0007 B45 DM test list
.. ... F56 HPY type 1 mining
Jack 1111 H78 HPY type 1 list
so i want to delete the name and ID or just place a zero or none for mining when that data is repeated like this.
Repeating , ok. How many repeating or is it just 2 like in your sample? Which record within a by group do you want to keep(some logic for that) plz?
data have;
input Name $ ID $ Code $ code_description & $15. type $;
cards;
Sarah 0007 A19 DM test mining
Sarah 0007 B45 DM test list
Jack 1111 F56 HPY type 1 mining
Jack 1111 H78 HPY type 1 list
;
data want;
set have;
if type='mining' then call missing(name,id);
run;
or
data want;
set have;
if type ne 'list' then call missing(name,id);
run;
thank you replying but i have other 'mining' type data in my set and they dont have 'list' type repeating for them so when that happens i want to keep the info as it is
Does that mean those others have just 1 record for an id? or is it 2 records both with type='mining' ?
Can you modify the sample
data have;
input Name $ ID $ Code $ code_description & $15. type $;
cards;
Sarah 0007 A19 DM test mining
Sarah 0007 B45 DM test list
Jack 1111 F56 HPY type 1 mining
Jack 1111 H78 HPY type 1 list
Bob 1222 F56 HPY type 1 mining
Rob 3333 A19 DM test mining
;
so some ids only have mining type but i dont wwant to remove their info. I only want to remover mining info if they have list type and mining type
data have;
input Name $ ID $ Code $ code_description & $15. type $;
cards;
Sarah 0007 A19 DM test mining
Sarah 0007 B45 DM test list
Jack 1111 F56 HPY type 1 mining
Jack 1111 H78 HPY type 1 list
Bob 1222 F56 HPY type 1 mining
Rob 3333 A19 DM test mining
;
proc sort data=have;
by id;
run;
data want;
call missing(f);
do until(last.id);
set have;
by id;
if type='list' then f=1;
end;
do until(last.id);
set have;
by id;
if f and type='mining' then call missing(name,id);
output;
end;
drop f;
run;
It sounds like that you have multiple observations per NAME,ID group and when that happens you want to remove the NAME and ID values from the observations that has TYPE='mining' and keep them for the observations that have TYPE='list'.
This sounds like a bad idea. What is the reason you want to do this? Perhaps you are just trying to produce a report?
If so then perhaps it is better to leave the values in place and use a reporting method that display the id variables just once per group?
data have ;
input Name $ ID $ Code $ code_description &:$15. type $ ;
cards;
Sarah 0007 A19 DM test mining
Sarah 0007 B45 DM test list
Jack 1111 F56 HPY type 1 mining
Jack 1111 H78 HPY type 1 list
;
proc sort; by Name ID type ; run;
proc print ;
by Name Id;
id Name Id;
run;
code_ Name ID Code description type Jack 1111 H78 HPY type 1 list F56 HPY type 1 mining Sarah 0007 B45 DM test list A19 DM test mining
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.