Hi all,
I have a data set where a small number of values are the same but actually they belong to different places. I want to scan the data set and if the first values are the same but the second values are different then I want to just append the second value to the values that are the same as others. Here is an example,
data have;
ID city
123abc 3
123abc 3
123abc 3
123abc 4
123abc 4
123abc 4
1245 dc 6
data want;
ID city
123abc 3
123abc 3
123abc 3
123abc4 4
123abc4 4
123abc4 4
1245dc 6
As shown above, I want to add the city value at the end of the ID value for the city equal to 4 because I need a data set only with distinct IDs if they belong to different places.
@mmhxc5 wrote:
Thank you for your comment. The reason I need to add the 'city' value to the 'ID' is that I need to have a unique ID in the data set because I after this step I will calculate the number of times an ID appeared, but since several cities used the same ID it makes the number of times an ID appeared incorrect. Each ID has other variables associated with as well. First I need to make sure that the IDs are not the same for different cities and then I will use PROC lifetest and PHREG.
If like your sample data indicates an ID is always the same within a city then why not just execute the PROCs by city.
If ID is not unique within a city then you could run the analysis by CITY ID or by ID CITY as @Astounding proposes - or if you really want this ID City combination value then what @Reeza proposes sound simple enough and fit for purpose.
...But if you really only want to concatenate the city to the ID for the 2nd to nth occurrence where it's not unique for a city then something complicated like below should do the job.
data have;
input ID $ city $;
cards;
123abc 3
123abc 3
123abc 3
123abc 4
123abc 4
123abc 4
1245dc 6
;
data want(drop=_:);
if _n_=1 then
do;
if 0 then set have(rename=(id=id2 city=_city));
dcl hash hhave();
hhave.defineKey('id');
hhave.defineData('_city');
hhave.defineDone();
dcl hash hwant();
hwant.defineKey('city','id');
hwant.defineData('city','id2');
hwant.defineDone();
end;
call missing(of _all_);
set have;
if hwant.find() ne 0 then
do;
if hhave.find() ne 0 then
do;
_city=city;
id2=id;
hhave.add();
hwant.add();
end;
else
if city ne _city then
do;
id2=cats(id,city);
hwant.add();
end;
end;
run;
proc print;
run;
proc sort data=have; by id city; run;
data want;
set have;
by id city;
retain id2;
if first.id then id2=id;
if first.city and ^first.id then id2 = id||city;
run;
I hope this helps
proc sort data=have; by id city; run;
data want(rename=(id2=id));
set have;
by id city;
retain id2;
if first.id then id2=id;
if first.city and ^first.id then id2 = id||city;
drop id;
run;
Thank you for your help. Would this code add all 'city' values to the IDs? I want the first ID associated with a city to be as it is and all other IDs that are associated with other cities (any number of cities) to have the city value to be added to the ID.
What is the defined length of the ID variable? If the variable has a maximum length of 6, implied from the shown example, and you attempt to add a suffix of one character to a value with 6 characters it will not have the desired value. And is the value of "city" numeric or character? It city is actually text, as implied from the name, then you almost certainly have 2 or more characters, which makes it even more likely that ID+City is longer than the existing ID variable can hold.
If that may be the case you need to insure that you have variable long enough to hold the longest expected value.
One way that works with what you show basically:
data have; input ID $ city $; datalines; 123abc 3 123abc 3 123abc 3 123abc 4 123abc 4 123abc 4 1245dc 6 ; proc sort data=have; by id city; run; data want; set have; by id city; length id2 $ 15; /*< length should be long enough to hold longest expected*/ retain citycount; if first.id then citycount=1; else if first.city then citycount+1; if citycount>1 then id2 =cats(id,city); else id2 = id; drop citycount; run;
Please note the first data step that creates data we can code with. It is preferred on this forum that you provide example data as a data step. That way we know exactly what type of variable, the properties such as numeric or character, length or format if any. Otherwise solutions that might work will be suggested but may not work with your data because the values are not numeric when assumed so. Which is a very critical item when you get to dealing with dates.
data have;
input ID $ city $;
cards;
123abc 3
123abc 3
123abc 3
123abc 4
123abc 4
123abc 4
1245dc 6
;
run;
data want(rename=(Id_=Id));
length Id_ $10.;
set have;
by id city;
retain sfx;
if first.id then
sfx='';
if ^ first.id and first.city then
sfx=City;
ID_=cats(ID,sfx);
drop id sfx;
run;
data have;
input ID $ city $;
cards;
123abc 3
123abc 3
123abc 3
123abc 4
123abc 4
123abc 4
1245dc 6
;
run;
data want;
set have;
by id city;
length new_id $50;
retain new_id;
if first.id then call missing(new_id);
else if first.city then new_id= catx("-", ID, city);
run;
/*or something fancy*/
data want;
do _n_=1 by 1 until(last.id);
do until(last.city);
set have;
by id city;
length new_id $50;
if _n_>1 then new_id= catx("-", ID, city);
output;
end;
end;
run;
Thank you for your help. The city values are not consecutive numbers. It is just a value and may be any number. Does your code account for this or in that case I need another code to do this? Also, If there are more than two cities with the same ID, but different city values, does your code handle that too?
Hi @mmhxc5 I believe it does. But why not test it to confirm? So please do some tests before you accept "any" solution.
In many cases, this is a waste of time. You might not want to process your data BY ID, but you can instead process it BY ID CITY. You don't need to concatenate the two variables to do this.
Of course, we don't know your intentions and what processing you need to go through ... but if you add some details it will be very likely that you don't need to concatenate the variables.
Thank you for your comment. The reason I need to add the 'city' value to the 'ID' is that I need to have a unique ID in the data set because I after this step I will calculate the number of times an ID appeared, but since several cities used the same ID it makes the number of times an ID appeared incorrect. Each ID has other variables associated with as well. First I need to make sure that the IDs are not the same for different cities and then I will use PROC lifetest and PHREG.
Just to take one example from your analysis ...
PROC LIFETEST supports both a BY statement and an ID statement. It's not clear which one you intend to use ... but both statements let you specify two variables instead of one. You don't need a single variable to identify your unit of measurement.
Here's a way to examine for uniqueness:
proc freq data=have;
tables id * city ./ list missing;
run;
@mmhxc5 wrote:
Thank you for your comment. The reason I need to add the 'city' value to the 'ID' is that I need to have a unique ID in the data set because I after this step I will calculate the number of times an ID appeared, but since several cities used the same ID it makes the number of times an ID appeared incorrect. Each ID has other variables associated with as well. First I need to make sure that the IDs are not the same for different cities and then I will use PROC lifetest and PHREG.
If like your sample data indicates an ID is always the same within a city then why not just execute the PROCs by city.
If ID is not unique within a city then you could run the analysis by CITY ID or by ID CITY as @Astounding proposes - or if you really want this ID City combination value then what @Reeza proposes sound simple enough and fit for purpose.
...But if you really only want to concatenate the city to the ID for the 2nd to nth occurrence where it's not unique for a city then something complicated like below should do the job.
data have;
input ID $ city $;
cards;
123abc 3
123abc 3
123abc 3
123abc 4
123abc 4
123abc 4
1245dc 6
;
data want(drop=_:);
if _n_=1 then
do;
if 0 then set have(rename=(id=id2 city=_city));
dcl hash hhave();
hhave.defineKey('id');
hhave.defineData('_city');
hhave.defineDone();
dcl hash hwant();
hwant.defineKey('city','id');
hwant.defineData('city','id2');
hwant.defineDone();
end;
call missing(of _all_);
set have;
if hwant.find() ne 0 then
do;
if hhave.find() ne 0 then
do;
_city=city;
id2=id;
hhave.add();
hwant.add();
end;
else
if city ne _city then
do;
id2=cats(id,city);
hwant.add();
end;
end;
run;
proc print;
run;
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.