Re: Compare and remove

Zakharkou · Posted 11-18-2023 02:46 AM

Hello dear SAS community.

I have the following question, there are two variables (id1 and id2) in a table (see below), which should be compared with each other.

Is it possible to delete the matching observations from this table (position of the observation does not matter), so that only the unique observations for these two variables remain there?

id1	id2
990000	990000
990002	990000contour
990003	990002
990006	990003
990007	990006
990008	990007
990016	990008
990018	990016
990021	990018
990024	990021

I would be delighted to receive an answer.

Quentin · Posted 11-18-2023 10:13 AM

Can you show the code you have tried? Are you familiar with the IF statement, and the DELETE statement?

BASUG is hosting free webinars ! Check out our recordings of past webinars: https://www.basug.org/videos. Be sure to subscribe to our email list for notification of future BASUG events.

Zakharkou · Posted 11-18-2023 12:41 PM

I have tried different variants, but have had no success. I think that due to the problem that the rows are sometimes offset, it is more likely to end up with proc sql.

Tom · Posted 11-18-2023 10:27 AM

I cannot figure out what you mean. Can you show what you would want to get for the example input you showed?

Zakharkou · Posted 11-18-2023 12:39 PM

Yes, of course. Thanks for the feedback.

I would like to use this data set:

id1	id2
990000	990000
990002	990000contour
990003	990002
990006	990003
990007	990006
990008	990007
990016	990008
990018	990016
990021	990018
990024	990021

to create the following data set:

id1	id2
990024	990000contour

Ultimately, it's about removing the repetitions on both sides of the table (id1 and id2), even if the rows are "offset".

Tom · Posted 11-18-2023 02:24 PM

Ok. So know we know what you want as the output, but WHY.

Why would you pair 990024 with 990000contour?

They appear to have nothing to do with each other in the original dataset.

The only way I can see to get those two values out is to treat the values as ONE variable, not TWO.

data list;
  input id :$20. @@;
cards;
990000	990000
990002	990000contour
990003	990002
990006	990003
990007	990006
990008	990007
990016	990008
990018	990016
990021	990018
990024	990021
;

proc sql;
create table want as 
  select id
  from list
  group by id
  having count(*)=1
;
quit;

proc print;
run;

Obs    id

 1     990000contour
 2     990024

andreas_lds · Posted 11-20-2023 02:44 AM

Could be solved by a data step:

data want;
   merge
      have(keep= id1 rename=(id1=id) in=inOne)
      have(keep= id2 rename=(id2=id) in=inTwo)
   ;
   by id;

   if inOne + inTwo = 1;
run;

EDIT: Both id-variables must be sorted for this step to work, so using one of the other solution is recommended.

mkeintz · Posted 11-24-2023 01:57 PM

As a minor variation, if you want to keep both variables ID1 and ID2, but with a unique row for each unique value, you could:

data work.have;
  input id1 :$20. id2 :$20.;
cards;
990000 990000
990002 990000contour
990003 990002
990006 990003
990007 990006
990008 990007
990016 990008
990018 990016
990021 990018
990024 990021
;

data want (label="unmatched ID1's and unmatched ID2's");
  set have (keep=id1 rename=id1=id in=in1)
      have (keep=id2 rename=id2=id in=in2);
  by id;
  if first.id=1 and last.id=1;
  if in1=1 then id1=id;
  else id2=id;
  drop id;
run;

proc print;
run;

which generates:

The SAS System

Obs	id1	id2
1		990000contour
2	990024

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Patrick · Posted 11-18-2023 07:17 PM

Here one option.

data work.have;
  input id1 :$20. id2 :$20.;
cards;
990000 990000
990002 990000contour
990003 990002
990006 990003
990007 990006
990008 990007
990016 990008
990018 990016
990021 990018
990024 990021
;


proc sql;
  title 'A EXCLUSIVE UNION B';
  (
    select id1 as id from work.have
    except
    select id2 as id from work.have
  )
  union
  (
    select id2 as id from work.have
    except
    select id1 as id from work.have
  )
  ;
quit;

Above code based on SAS Docu Producing Rows from the First Query or the Second Query

Zakharkou · Posted 11-24-2023 01:43 AM

It looks almost as desired. Would it be possible not to merge the 2 columns, but to leave them as they were with id1 and id2?

andreas_lds · Posted 11-24-2023 03:16 AM

@Zakharkou wrote:
It looks almost as desired. Would it be possible not to merge the 2 columns, but to leave them as they were with id1 and id2?

Here's a lengthy solution:

data work.have;
  input id1 :$20. id2 :$20.;
cards;
990000 990000
990002 990000contour
990003 990002
990006 990003
990007 990006
990008 990007
990016 990008
990018 990016
990021 990018
990024 990021
;


data work.stacked;
	set work.have;
	
	array ids[2] id1 id2;
	
	do source = 1 to dim(ids);
		id = ids[source];
		output;
	end;
	
	drop id1 id2;
run;

proc sort data=work.stacked;
	by id;
run;

data work.unique;
	set work.stacked;
	by id;
	
	if first.id and last.id;
run;

proc sort data=work.unique;
	by source;
run;

proc transpose data=work.unique out=want(drop=_name_) prefix=id;
	id source;
	var id;
run;

Patrick · Posted 11-24-2023 06:38 AM

@Zakharkou wrote:
It looks almost as desired. Would it be possible not to merge the 2 columns, but to leave them as they were with id1 and id2?

Below should work

proc sql;
  title 'A EXCLUSIVE UNION B';
  (
    select id1, ' ' as id2 from work.have
    except
    select id2 as id1, ' ' as id2 from work.have
  )
  union all corr
  (
    select id2, ' ' as id1 from work.have
    except
    select id1 as id2, ' ' as id1 from work.have
  )
  ;
quit;

Ready to join fellow brilliant minds for the SAS Hackathon?

Classroom Training Available!