Solved: Re: Data Manipulation and clean up

NikosStratis · Posted 02-11-2020 02:19 AM

Hello team!

I have brainteaser… I need to clean up a file with extra entries. I have the following data:

data work.have;

infile datalines delimiter=',';

length Field_A $8 Field_B $8;

input Field_A $ Field_B $;

datalines;

13976907,12555848

12555848,13976907

13589994,11280580

11280580,13589994

10947277,10808526

10808526,10947277

12334829,92335000

92335000,12334829

;

run;

What I need to do is the following:

When Field_A from row 1 is equal to Field_B from row 2

Then I need to keep only row 1.

The data comes in pairs of two rows and it is like this,

row1.Field_a = row2.Filed_B AND row1.Field_B = Row2.Field_A

my output shall be:

13976907,12555848

13589994,11280580

10947277,10808526

12334829,92335000

I haven’t done anything like that in the past… can this bee accomplished?

Any help is welcomed!

Thank you in advance!

NikosStratis · Posted 02-11-2020 04:07 AM

It works just fine!

View solution in original post

ed_sas_member · Posted 02-11-2020 03:52 AM

Hi @NikosStratis

Does this code meet your expectations:

data want (drop=Field_B_2);
	merge have have (keep=Field_B rename=(Field_B = Field_B_2) firstobs=2);
	if Field_A = Field_B_2 then output;
run;

Kurt_Bremser · Posted 02-11-2020 03:58 AM

By using the lag() function, and modulo of _n_:

data work.have;
infile datalines delimiter=','; 
length Field_A $8 Field_B $8;
input Field_A $ Field_B $;
datalines;
13976907,12555848
12555848,13976907
13589994,11280580
11280580,13589994
10947277,10808526
10808526,10947277
12334829,92335000
92335000,12334829
;

data want;
set have;
if mod(_n_,2) = 0 and lag(field_a) = field_b then delete;
run;

proc print data=want noobs;
run;

Result:

Field_A     Field_B

13976907    12555848
13589994    11280580
10947277    10808526
12334829    92335000

Please use the "little running man" for posting code (as I did), so that the code is kept as-is. Posting code into the main posting window will cause unwanted changes (extra lines, smileys, ...)

Maxims of Maximally Efficient SAS Programmers
How to convert datasets to data steps
The macro for direct download as ZIP
How to post code
Please vote for Provide Sequential Search Capability for Hash Objects
How to deal with locked files on UNIX

NikosStratis · Posted 02-11-2020 04:07 AM

Thank you so much!

Iτ ςορκσ ξθστ φινε!

NikosStratis · Posted 02-11-2020 04:07 AM

It works just fine!

gamotte · Posted 02-11-2020 06:28 AM

The accepted answer is displayed just after the question in the thread so that interested readers can find it quickly.

You should thus select the answer that solved your problem, here , i think @ed_sas_member 's or @Kurt_Bremser 's.

gamotte · Posted 02-11-2020 04:12 AM

The data comes in pairs of two rows and it is like this,

row1.Field_a = row2.Filed_B AND row1.Field_B = Row2.Field_A

So you just need to drop even rows :

data work.have;
infile datalines delimiter=','; 
length Field_A $8 Field_B $8;
input Field_A $ Field_B $;

if mod(_N_,2);

datalines;
13976907,12555848
12555848,13976907
13589994,11280580
11280580,13589994
10947277,10808526
10808526,10947277
12334829,92335000
92335000,12334829
;
run;

Ksharp · Posted 02-11-2020 05:58 AM

data work.have;
infile datalines delimiter=','; 
length Field_A $8 Field_B $8;
input Field_A $ Field_B $;
datalines;
13976907,12555848
12555848,13976907
13589994,11280580
11280580,13589994
10947277,10808526
10808526,10947277
12334829,92335000
92335000,12334829
;

data temp;
 set have;
 call sortc(field_a,field_b);
run;
proc sort data=temp out=want nodupkey;
by field_a field_b;
run;

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away