BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
mmhxc5
Quartz | Level 8

Hi all,

I have a data set where a small number of values are the same but actually they belong to different places. I want to scan the data set and if the first values are the same but the second values are different then I want to just append the second value to the values that are the same as others. Here is an example,

data have;

 ID             city

123abc        3

123abc        3

123abc        3

123abc        4

123abc        4

123abc        4

1245 dc       6

 

data want;

ID             city

123abc        3

123abc        3

123abc        3

123abc4      4

123abc4      4

123abc4      4

1245dc       6

  

As shown above, I want to add the city value at the end of the ID value for the city equal to 4  because I need a data set only with distinct IDs if they belong to different places.  

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@mmhxc5 wrote:

Thank you for your comment. The reason I need to add the 'city' value to the 'ID' is that I need to have a unique ID in the data set because I after this step I will calculate the number of times an ID appeared, but since several cities used the same ID it makes the number of times an ID appeared incorrect. Each ID has other variables associated with as well. First I need to make sure that the IDs are not the same for different cities and then I will use PROC lifetest and PHREG.


If like your sample data indicates an ID is always the same within a city then why not just execute the PROCs by city.

If ID is not unique within a city then you could run the analysis by CITY ID or by ID CITY as @Astounding proposes - or if you really want this ID City combination value then what @Reeza proposes sound simple enough and fit for purpose.

 

...But if you really only want to concatenate the city to the ID for the 2nd to nth occurrence where it's not unique for a city then something complicated like below should do the job.

data have;
  input ID $ city $;
  cards;
123abc 3
123abc 3
123abc 3
123abc 4
123abc 4
123abc 4
1245dc 6
;

data want(drop=_:);
  if _n_=1 then
    do;
      if 0 then set have(rename=(id=id2 city=_city));
      dcl hash hhave();
      hhave.defineKey('id');
      hhave.defineData('_city');
      hhave.defineDone();
      dcl hash hwant();
      hwant.defineKey('city','id');
      hwant.defineData('city','id2');
      hwant.defineDone();
    end;
  call missing(of _all_);

  set have;

  if hwant.find() ne 0 then
    do;
      if hhave.find() ne 0 then 
        do;
          _city=city;
          id2=id;
          hhave.add();
          hwant.add();
        end;
      else
      if city ne _city then 
        do;
          id2=cats(id,city);
          hwant.add();
        end;
    end;
run;

proc print;
run;

Patrick_0-1592604722700.png

 

 

View solution in original post

14 REPLIES 14
smantha
Lapis Lazuli | Level 10

 

proc sort data=have; by id city; run;

data want;

set have;

by id city;

retain id2;

if first.id then id2=id;

if first.city and ^first.id then id2 = id||city;

run;

 

I hope this helps

smantha
Lapis Lazuli | Level 10
proc sort data=have; by id city; run;

data want(rename=(id2=id));

set have;

by id city;

retain id2;

if first.id then id2=id;

if first.city and ^first.id then id2 = id||city;
drop id;
run;
Reeza
Super User
Slightly easier seems to me the idea to just add the city at the end?

unique_ID = catx("-", ID, city);
mmhxc5
Quartz | Level 8

Thank you for your help. Would this code add all 'city' values to the IDs? I want the first ID associated with a city to be as it is and all other IDs that are associated with other cities (any number of cities) to have the city value to be added to the ID. 

Reeza
Super User
Conditional logic makes it harder to use later on as some IDs have one portion and others do not. Its more logical IMO to just add all the City values to your IDs to create unique IDs overall that are consistent.
ballardw
Super User

What is the defined length of the ID variable? If the variable has a maximum length of 6, implied from the shown example, and you attempt to add a suffix of one character to a value with 6 characters it will not have the desired value. And is the value of "city" numeric or character? It city is actually text, as implied from the name, then you almost certainly have 2 or more characters, which makes it even more likely that ID+City is longer than the existing ID variable can hold.

If that may be the case you need to insure that you have  variable long enough to hold the longest expected value.

 

One way that works with what you show basically:

data have;
   input  ID $  city $;
datalines;
123abc        3
123abc        3
123abc        3
123abc        4
123abc        4
123abc        4
1245dc       6
;

proc sort data=have; 
   by id city; 
run;
data want;
   set have;
   by id city;
   length id2 $ 15; /*< length should be long enough to hold longest expected*/
   retain citycount;
   if first.id then citycount=1;
   else if first.city then citycount+1;
   if citycount>1 then id2 =cats(id,city);
   else id2 = id;
   drop citycount;
run;

Please note the first data step that creates data we can code with. It is preferred on this forum that you provide example data as a data step. That way we know exactly what type of variable, the properties such as numeric or character, length or format if any. Otherwise solutions that might work will be suggested but may not work with your data because the values are not numeric when assumed so. Which is a very critical item when you get to dealing with dates.

r_behata
Barite | Level 11
data have;
input ID $ city $;
cards;
123abc 3
123abc 3
123abc 3
123abc 4
123abc 4
123abc 4
1245dc 6
;
run;

data want(rename=(Id_=Id));
	length Id_ $10.;
	set have;
	by id city;
	retain sfx;

	if first.id then
		sfx='';

	if ^ first.id and first.city then
		sfx=City;
	ID_=cats(ID,sfx);
	
	drop id sfx;
run;
novinosrin
Tourmaline | Level 20


data have;
input ID $ city $;
cards;
123abc 3
123abc 3
123abc 3
123abc 4
123abc 4
123abc 4
1245dc 6
;
run;
data want;
 set have;
 by id city;
 length  new_id $50;
 retain new_id;
 if first.id then call missing(new_id);
 else if first.city then new_id= catx("-", ID, city);
run;
/*or something fancy*/
data want;
 do _n_=1 by 1 until(last.id);
  do until(last.city);
   set have;
   by id city;
   length  new_id $50;
   if _n_>1 then new_id= catx("-", ID, city);
   output;
  end;
 end;
run;
mmhxc5
Quartz | Level 8

Thank you for your help. The city values are not consecutive numbers. It is just a value and may be any number. Does your code account for this or in that case I need another code to do this? Also, If there are more than two cities with the same ID, but different city values, does your code handle that too?

novinosrin
Tourmaline | Level 20

Hi @mmhxc5  I believe it does. But why not test it to confirm? So please do some tests before you accept "any" solution. 

Astounding
PROC Star

In many cases, this is a waste of time.  You might not want to process your data BY ID, but you can instead process it BY ID CITY.  You don't need to concatenate the two variables to do this.

 

Of course, we don't know your intentions and what processing you need to go through ... but if you add some details it will be very likely that you don't need to concatenate the variables.

mmhxc5
Quartz | Level 8

Thank you for your comment. The reason I need to add the 'city' value to the 'ID' is that I need to have a unique ID in the data set because I after this step I will calculate the number of times an ID appeared, but since several cities used the same ID it makes the number of times an ID appeared incorrect. Each ID has other variables associated with as well. First I need to make sure that the IDs are not the same for different cities and then I will use PROC lifetest and PHREG.

Astounding
PROC Star

Just to take one example from your analysis ...

 

PROC LIFETEST supports both a BY statement and an ID statement.  It's not clear which one you intend to use ... but both statements let you specify two variables instead of one.  You don't need a single variable to identify your unit of measurement.

 

Here's a way to examine for uniqueness:

 

proc freq data=have;

tables id * city ./ list missing;

run;

Patrick
Opal | Level 21

@mmhxc5 wrote:

Thank you for your comment. The reason I need to add the 'city' value to the 'ID' is that I need to have a unique ID in the data set because I after this step I will calculate the number of times an ID appeared, but since several cities used the same ID it makes the number of times an ID appeared incorrect. Each ID has other variables associated with as well. First I need to make sure that the IDs are not the same for different cities and then I will use PROC lifetest and PHREG.


If like your sample data indicates an ID is always the same within a city then why not just execute the PROCs by city.

If ID is not unique within a city then you could run the analysis by CITY ID or by ID CITY as @Astounding proposes - or if you really want this ID City combination value then what @Reeza proposes sound simple enough and fit for purpose.

 

...But if you really only want to concatenate the city to the ID for the 2nd to nth occurrence where it's not unique for a city then something complicated like below should do the job.

data have;
  input ID $ city $;
  cards;
123abc 3
123abc 3
123abc 3
123abc 4
123abc 4
123abc 4
1245dc 6
;

data want(drop=_:);
  if _n_=1 then
    do;
      if 0 then set have(rename=(id=id2 city=_city));
      dcl hash hhave();
      hhave.defineKey('id');
      hhave.defineData('_city');
      hhave.defineDone();
      dcl hash hwant();
      hwant.defineKey('city','id');
      hwant.defineData('city','id2');
      hwant.defineDone();
    end;
  call missing(of _all_);

  set have;

  if hwant.find() ne 0 then
    do;
      if hhave.find() ne 0 then 
        do;
          _city=city;
          id2=id;
          hhave.add();
          hwant.add();
        end;
      else
      if city ne _city then 
        do;
          id2=cats(id,city);
          hwant.add();
        end;
    end;
run;

proc print;
run;

Patrick_0-1592604722700.png

 

 

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 14 replies
  • 1616 views
  • 6 likes
  • 8 in conversation