Solved: Re: How to attach the second variable value to make the first variable...

mmhxc5 · Posted 06-15-2020 06:11 PM

Hi all,

I have a data set where a small number of values are the same but actually they belong to different places. I want to scan the data set and if the first values are the same but the second values are different then I want to just append the second value to the values that are the same as others. Here is an example,

data have;

ID city

123abc 3

123abc 4

1245 dc 6

data want;

ID city

123abc 3

123abc4 4

1245dc 6

As shown above, I want to add the city value at the end of the ID value for the city equal to 4 because I need a data set only with distinct IDs if they belong to different places.

Patrick · Posted 06-19-2020 06:12 PM

@mmhxc5 wrote:

Thank you for your comment. The reason I need to add the 'city' value to the 'ID' is that I need to have a unique ID in the data set because I after this step I will calculate the number of times an ID appeared, but since several cities used the same ID it makes the number of times an ID appeared incorrect. Each ID has other variables associated with as well. First I need to make sure that the IDs are not the same for different cities and then I will use PROC lifetest and PHREG.

If like your sample data indicates an ID is always the same within a city then why not just execute the PROCs by city.

If ID is not unique within a city then you could run the analysis by CITY ID or by ID CITY as @Astounding proposes - or if you really want this ID City combination value then what @Reeza proposes sound simple enough and fit for purpose.

...But if you really only want to concatenate the city to the ID for the 2nd to nth occurrence where it's not unique for a city then something complicated like below should do the job.

data have;
  input ID $ city $;
  cards;
123abc 3
123abc 3
123abc 3
123abc 4
123abc 4
123abc 4
1245dc 6
;

data want(drop=_:);
  if _n_=1 then
    do;
      if 0 then set have(rename=(id=id2 city=_city));
      dcl hash hhave();
      hhave.defineKey('id');
      hhave.defineData('_city');
      hhave.defineDone();
      dcl hash hwant();
      hwant.defineKey('city','id');
      hwant.defineData('city','id2');
      hwant.defineDone();
    end;
  call missing(of _all_);

  set have;

  if hwant.find() ne 0 then
    do;
      if hhave.find() ne 0 then 
        do;
          _city=city;
          id2=id;
          hhave.add();
          hwant.add();
        end;
      else
      if city ne _city then 
        do;
          id2=cats(id,city);
          hwant.add();
        end;
    end;
run;

proc print;
run;

View solution in original post

smantha · Posted 06-15-2020 06:21 PM

proc sort data=have; by id city; run;

data want;

set have;

by id city;

retain id2;

if first.id then id2=id;

if first.city and ^first.id then id2 = id||city;

run;

I hope this helps

smantha · Posted 06-15-2020 06:42 PM

proc sort data=have; by id city; run;

data want(rename=(id2=id));

set have;

by id city;

retain id2;

if first.id then id2=id;

if first.city and ^first.id then id2 = id||city;
drop id;
run;

Reeza · Posted 06-15-2020 06:37 PM

Slightly easier seems to me the idea to just add the city at the end?

unique_ID = catx("-", ID, city);

mmhxc5 · Posted 06-17-2020 11:34 AM

Thank you for your help. Would this code add all 'city' values to the IDs? I want the first ID associated with a city to be as it is and all other IDs that are associated with other cities (any number of cities) to have the city value to be added to the ID.

Reeza · Posted 06-17-2020 11:44 AM

Conditional logic makes it harder to use later on as some IDs have one portion and others do not. Its more logical IMO to just add all the City values to your IDs to create unique IDs overall that are consistent.

ballardw · Posted 06-15-2020 06:41 PM

What is the defined length of the ID variable? If the variable has a maximum length of 6, implied from the shown example, and you attempt to add a suffix of one character to a value with 6 characters it will not have the desired value. And is the value of "city" numeric or character? It city is actually text, as implied from the name, then you almost certainly have 2 or more characters, which makes it even more likely that ID+City is longer than the existing ID variable can hold.

If that may be the case you need to insure that you have variable long enough to hold the longest expected value.

One way that works with what you show basically:

data have;
   input  ID $  city $;
datalines;
123abc        3
123abc        3
123abc        3
123abc        4
123abc        4
123abc        4
1245dc       6
;

proc sort data=have; 
   by id city; 
run;
data want;
   set have;
   by id city;
   length id2 $ 15; /*< length should be long enough to hold longest expected*/
   retain citycount;
   if first.id then citycount=1;
   else if first.city then citycount+1;
   if citycount>1 then id2 =cats(id,city);
   else id2 = id;
   drop citycount;
run;

Please note the first data step that creates data we can code with. It is preferred on this forum that you provide example data as a data step. That way we know exactly what type of variable, the properties such as numeric or character, length or format if any. Otherwise solutions that might work will be suggested but may not work with your data because the values are not numeric when assumed so. Which is a very critical item when you get to dealing with dates.

r_behata · Posted 06-15-2020 06:41 PM

data have;
input ID $ city $;
cards;
123abc 3
123abc 3
123abc 3
123abc 4
123abc 4
123abc 4
1245dc 6
;
run;

data want(rename=(Id_=Id));
	length Id_ $10.;
	set have;
	by id city;
	retain sfx;

	if first.id then
		sfx='';

	if ^ first.id and first.city then
		sfx=City;
	ID_=cats(ID,sfx);
	
	drop id sfx;
run;

novinosrin · Posted 06-15-2020 07:14 PM



data have;
input ID $ city $;
cards;
123abc 3
123abc 3
123abc 3
123abc 4
123abc 4
123abc 4
1245dc 6
;
run;
data want;
 set have;
 by id city;
 length  new_id $50;
 retain new_id;
 if first.id then call missing(new_id);
 else if first.city then new_id= catx("-", ID, city);
run;
/*or something fancy*/
data want;
 do _n_=1 by 1 until(last.id);
  do until(last.city);
   set have;
   by id city;
   length  new_id $50;
   if _n_>1 then new_id= catx("-", ID, city);
   output;
  end;
 end;
run;

mmhxc5 · Posted 06-17-2020 11:39 AM

Thank you for your help. The city values are not consecutive numbers. It is just a value and may be any number. Does your code account for this or in that case I need another code to do this? Also, If there are more than two cities with the same ID, but different city values, does your code handle that too?

novinosrin · Posted 06-17-2020 11:42 AM

Hi @mmhxc5 I believe it does. But why not test it to confirm? So please do some tests before you accept "any" solution.

Astounding · Posted 06-15-2020 11:43 PM

In many cases, this is a waste of time. You might not want to process your data BY ID, but you can instead process it BY ID CITY. You don't need to concatenate the two variables to do this.

Of course, we don't know your intentions and what processing you need to go through ... but if you add some details it will be very likely that you don't need to concatenate the variables.

mmhxc5 · Posted 06-17-2020 11:27 AM

Thank you for your comment. The reason I need to add the 'city' value to the 'ID' is that I need to have a unique ID in the data set because I after this step I will calculate the number of times an ID appeared, but since several cities used the same ID it makes the number of times an ID appeared incorrect. Each ID has other variables associated with as well. First I need to make sure that the IDs are not the same for different cities and then I will use PROC lifetest and PHREG.

Astounding · Posted 06-19-2020 05:13 PM

Just to take one example from your analysis ...

PROC LIFETEST supports both a BY statement and an ID statement. It's not clear which one you intend to use ... but both statements let you specify two variables instead of one. You don't need a single variable to identify your unit of measurement.

Here's a way to examine for uniqueness:

proc freq data=have;

tables id * city ./ list missing;

run;

Patrick · Posted 06-19-2020 06:12 PM

@mmhxc5 wrote:

Thank you for your comment. The reason I need to add the 'city' value to the 'ID' is that I need to have a unique ID in the data set because I after this step I will calculate the number of times an ID appeared, but since several cities used the same ID it makes the number of times an ID appeared incorrect. Each ID has other variables associated with as well. First I need to make sure that the IDs are not the same for different cities and then I will use PROC lifetest and PHREG.

If like your sample data indicates an ID is always the same within a city then why not just execute the PROCs by city.

If ID is not unique within a city then you could run the analysis by CITY ID or by ID CITY as @Astounding proposes - or if you really want this ID City combination value then what @Reeza proposes sound simple enough and fit for purpose.

...But if you really only want to concatenate the city to the ID for the 2nd to nth occurrence where it's not unique for a city then something complicated like below should do the job.

data have;
  input ID $ city $;
  cards;
123abc 3
123abc 3
123abc 3
123abc 4
123abc 4
123abc 4
1245dc 6
;

data want(drop=_:);
  if _n_=1 then
    do;
      if 0 then set have(rename=(id=id2 city=_city));
      dcl hash hhave();
      hhave.defineKey('id');
      hhave.defineData('_city');
      hhave.defineDone();
      dcl hash hwant();
      hwant.defineKey('city','id');
      hwant.defineData('city','id2');
      hwant.defineDone();
    end;
  call missing(of _all_);

  set have;

  if hwant.find() ne 0 then
    do;
      if hhave.find() ne 0 then 
        do;
          _city=city;
          id2=id;
          hhave.add();
          hwant.add();
        end;
      else
      if city ne _city then 
        do;
          id2=cats(id,city);
          hwant.add();
        end;
    end;
run;

proc print;
run;

How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Re: How to attach the second variable value to make the first variable unique

Classroom Training Available!