SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

To compare two input files in Data Management Studio 2.4

Reply
Contributor
Posts: 21

To compare two input files in Data Management Studio 2.4


I have two input files in data job to compare the data. The requirement is, if the JC  (field) is changed from previous file to current file (if the JC changes from alphanumeric character to numeric character) then it should display the output.

For e.g.

File 1                                                                   File 2

Previous Table Current Table
PNOJCName PNOJCName
111CMR1xxxx 11120xxxx
112SAL2yyyy 11240yyyy
11320zzzz 11320zzzz
11440aaaa 11440aaaa
115CMR1bbbb 115CMR2bbbb
116CMR1cccc 116CMR3cccc
11720dddd 11720dddd
11820eeee 12020llll
11920ffff 12120kkkk

For the above table the output should be as below becoz the data is changing from CMR1 to 20, i.e. alphanumeric to numeric.

PNOJCName
11120xxxx
11240yyyy

Regards,

Shaheen

Contributor
Posts: 21

Re: To compare two input files in Data Management Studio 2.4

Hello All,

Any input or view point on the above question?

Regards,

Shaheen

Super User
Posts: 6,963

Re: To compare two input files in Data Management Studio 2.4

Should any change be caught (ie 20 to 40), or just a change in type?

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Contributor
Posts: 21

Re: To compare two input files in Data Management Studio 2.4

Its mainly change in type. No other changes.

Super User
Posts: 6,963

Re: To compare two input files in Data Management Studio 2.4

This will detect changes from alpha to numeric:

data want (keep=PNO JC Name);

merge

  file1 (in=in1 rename=(JC=JC_old))

  file2 (in=in2 drop=Name)

;

by PNO;

if in1 and in2;

int_string = trim(JC_old); * get rid of leading blanks;

num = notdigit(int_string);

if num ne 0 and num ne length(int_string) + 1; * JC_old is alpha;

int_string = trim(JC); * get rid of leading blanks;

num = notdigit(int_string);

if num = 0 or num = length(int_string) + 1; * JC is numeric;

run;

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Respected Advisor
Posts: 3,777

Re: To compare two input files in Data Management Studio 2.4

You might want to check the documentation regarding TRIM function.

Super User
Posts: 6,963

Re: To compare two input files in Data Management Studio 2.4

Yeah, right.

s/trim/strip/g

---------------------------------------------------------------------------------------------
Maxims of Maximally Efficient SAS Programmers
Respected Advisor
Posts: 3,777

Re: To compare two input files in Data Management Studio 2.4

This might adequate and is pretty simple.

data Previous;
   input (PNO  JC Name)($);
   cards;
111   CMR1  xxxx
112   SAL2  yyyy
113   20 zzzz
114   40 aaaa
115   CMR1  bbbb
116   CMR1  cccc
117   20 dddd
118   20 eeee
119   20 ffff
;;;;
   run;
proc print;
  
run;
data Current;
   input (PNO  JC Name)($);
   cards;
111   20 xxxx
112   40 yyyy
113   20 zzzz
114   40 aaaa
115   CMR2  bbbb
116   CMR3  cccc
117   20 dddd
120   20 llll
121   20 kkkk
;;;;
   run;
proc print;
  
run;

data want;
   merge
      Previous(
in=in1 rename=(JC=PrevJC))
      Current(
in=in2  rename=(name=CurName))
      ;
   by PNO;
   if in1 and in2;
   n1 = input(prevjc,?? f8.);
   n2 = input(jc    ,?? f8.);
   if missing(n1) and not missing(n2);
    drop n1 n2;
   run;
proc print;
  
run;
Super User
Posts: 9,687

Re: To compare two input files in Data Management Studio 2.4

Code: Program

data Previous;
   input (PNO  JC Name)($);
   cards;
111 CMR1 xxxx
112 SAL2 yyyy
113 20 zzzz
114 40 aaaa
115 CMR1 bbbb
116 CMR1 cccc
117 20 dddd
118 20 eeee
119 20 ffff
;;;;
   run;

data Current;
   input (PNO  JC Name)($);
   cards;
111 20 xxxx
112 40 yyyy
113 20 zzzz
114 40 aaaa
115 CMR2 bbbb
116 CMR3 cccc
117 20 dddd
120 20 llll
121 20 kkkk
;;;;
   run;


data want;
   merge
   Previous( rename=(JC=PrevJC))
   Current(rename=(name=CurName))
   ;
   by PNO;
if prxmatch('/^[a-z]+\d+$/i',strip(PrevJC)) and prxmatch('/^\d+$/i',strip(JC))  ;
   run;

Xia Keshan

Message was edited by: xia keshan

Message was edited by: xia keshan fix a mistake.

SAS Employee
Posts: 85

Re: To compare two input files in Data Management Studio 2.4

Shah,

Here is a solution in DM Studio. This solution works regardless of the field type on the input table. It scans each character and creates a profile of the field (JC). The data is joined and then an expression identifies where the field profile is different and outputs that data. The expression is quite simple:

`char_type_prev` != `char_type_curr`

2015-06-09 09_17_59-DataFlux Data Management Studio 2.3.png

2015-06-09 09_19_42-DataFlux Data Management Studio 2.3.png

The data job is also attached.

-shawn

Attachment
Contributor
Posts: 21

Re: To compare two input files in Data Management Studio 2.4

Hi Shawn,

Please attach the ddf file once again, I am not able to open it. You have used job specific data node as input file and i am using data source node as input file as there are other fields which has to be populated. I didnt get the usage of expression. Can you please eloborate it?

Hi Xia,

The code you have written, ist specially for the input i have specified. My input will keep changing. There is no fixed input to use it in coding and also are you using the expression here, it seems to me as complete coding and in data flux we have nodes and expressions. I am not good at technical. Please help me in understanding.

And also TRIM function is not very usefull.

Regards,

Shaheen

SAS Employee
Posts: 85

Re: To compare two input files in Data Management Studio 2.4

Shah,

I just used the job specific data node for demonstration purposes. If you replace this with your data inputs, assuming the column names are the same, the logic will work as expected.

This is the expression logic used in the 'Field Profile' node:

/* Pre-Expression */

hidden real field_length

hidden integer cnt

string char_type

cnt = 1

field_length = len(`JC`)

char_type = ''

while cnt <= field_length

  begin

  if isnumber(mid(`JC`, cnt, 1))

  char_type = char_type & ' ' & 'numeric'

  else

  char_type = char_type & ' ' & 'alpha'

  cnt = cnt + 1

  end

cnt = 1


What this expression does is loop through the entire value to determine if you have all numeric, all alpha or a mix (alphanumeric). Because your input (the JC Column) can be characters or numeric the column type is set to a character (string) field. You must look at a profile of the field's value instead of the field type based on your requirement.

-shawn

Message was edited by: shawn skillman Unable to reattach the DDF.

Occasional Contributor
Posts: 6

Re: To compare two input files in Data Management Studio 2.4

Hello,

               I really liked your solution, very creative.

               But it would be possible to use the  function pattern(* string *), which returns something similar to the created by their EEL ( char_type_curr == W_PATTERN_curr ):

PNO_currJC_currName_currW_PATTERN_currchar_type_currPNO_prevJC_prevName_prevW_TIPO_PREVchar_type_prevW_PATTERN_prev
11720dddd99 numeric numeric11720ddddinteger numeric numeric99
116CMR1ccccAAA9 alpha alpha alpha numeric116CMR3ccccstring alpha alpha alpha numericAAA9
115CMR1bbbbAAA9 alpha alpha alpha numeric115CMR2bbbbstring alpha alpha alpha numericAAA9
11440aaaa99 numeric numeric11440aaaainteger numeric numeric99
11320zzzz99 numeric numeric11320zzzzinteger numeric numeric99
112SAL2yyyyAAA9 alpha alpha alpha numeric11240yyyyinteger numeric numeric99
111CMR1xxxxAAA9 alpha alpha alpha numeric11120xxxxinteger numeric numeric99

   This solution could also be a ??

Thanks,

Ricardo Benatti.

SAS Employee
Posts: 85

Re: To compare two input files in Data Management Studio 2.4

The code I supplied essentially created a user-defined function vs. using an existing function. Creating a user-defined function can enable the users to see what is happening behind the scenes with the pattern() function.

Either way will work.

Occasional Contributor
Posts: 6

Re: To compare two input files in Data Management Studio 2.4

so ...

     also understand how you ...

     EEL exemplifies the treatment attribute ...

     the functions implemented by the solution should have a better performance ...

     the intention was to contribute only ...

Regards,

Ask a Question
Discussion stats
  • 16 replies
  • 983 views
  • 4 likes
  • 6 in conversation