DATA Step, Macro, Functions and more

Difference between Two dataset variable values

Accepted Solution Solved
Reply
Contributor
Posts: 33
Accepted Solution

Difference between Two dataset variable values

[ Edited ]

Hi,

I have requirement to compare two monthly extracts which has same varibales, but values may be changed between two datasets. I need to extract an observation that has different value in nay variable compare to last month dataset. For example below:

 

 

 DATA SasConf;
INFILE DATALINES;
INPUT
ID
ConfName $
ConfYear
ConfCity $
ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco CA
2 PHARMASUG 2006 Bonita Springs FL
3 NESUG 2006 newyork NY
4 WUSS 2006 Irvine CA
5 SESUG 2006 Atlanta GA
6 SCSUG 2006 Irving TX
7 MWSUG 2006 Dearborn MI
8 SUGI 2007 Orlando FL
;;;;
 
DATA SasConf2;
INFILE DATALINES;
INPUT  ID
ConfName $
ConfYear
ConfCity $
ConfST $ ;
DATALINES4;
 1 SUGI 2006 San Francisco CA
2 PHARMASUG 2006 Bonita Springs FL
3 NESUG 2006 Philadelphia PA
4 WUSS 2006 Irvine CA
5 SESUG 2006 Atlanta GA
6 SCSUG 2006 Irving TX
7 PNWSUG 2006 Seaside OR
8 NESUGI 2007 Orlando FL
;;;;

 

Compare these two datasets and the output datset should have following obervations:

3 NESUG 2006 Philadelphia PA

8 NESUGI 2007 Orlando FL

 

Note: there is primary key (ID) in the dataset, so any variable value can be different, so need to select the observation which is changed from the last month.

 is the way that we can do this on the data step (don’t want to use proc compare).

 

Thank you for all your help.

 


Accepted Solutions
Solution
‎02-22-2017 01:00 PM
Super Contributor
Posts: 298

Re: Difference between Two dataset variable values

A Data Step Approach.

 

For each ID, concatenate all variables into a LIST, for this purpose, confYear can  be read as a string. Do this with one data set. Then read the other data set and make a concatenated LIST for each observation. For the given ID, check the LIST obtained in the previous data set. If the two are not equal, then output the record. I use a string array to hold the LIST, the array is referenced by the ID as Index of the array.

The progam can me made dynamic by getting the number of IDs and the total string-lengths of all variables prior to the data step and fill these in the appropriate statements.

I am getting records 3, 7 and 8 against the reported 3 and 8. If I have missed something, let me know.

DATA SasConf;
INFILE DATALINES;
INPUT ID ConfName :$9. ConfYear $4. ConfCity &$15. ConfST $ ;
DATALINES4;
1 SUGI 2006   San Francisco   CA
2 PHARMASUG 2006   Bonita Springs   FL
3 NESUG 2006   newyork    NY
4 WUSS 2006   Irvine    CA
5 SESUG 2006   Atlanta    GA
6 SCSUG 2006   Irving    TX
7 MWSUG 2006   Dearborn    MI
8 SUGI 2007   Orlando    FL
;;;;
run;


DATA SasConf2;
INFILE DATALINES;
INPUT  ID ConfName :$9. ConfYear $4. ConfCity &$15. ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco   CA
2 PHARMASUG 2006 Bonita Springs   FL
3 NESUG 2006 Philadelphia   PA
4 WUSS 2006 Irvine   CA
5 SESUG 2006 Atlanta   GA
6 SCSUG 2006 Irving   TX
7 PNWSUG 2006 Seaside   OR
8 NESUGI 2007 Orlando   FL
;;;;
run;


data want;
	if _n_ = 1 then do;
		array k[8] $35 _temporary_ ;
		length list $35;
		do j = 1 by 1 until(eof);
			set sasconf2 end = eof;
			array vars ConfName -- ConfST;
			do i = 1 to dim(vars);
				call catx('|', list, trim(vars[i]));
			end;
			k[j] = list;
			call missing(list);
		end;
	end;
	set sasconf;
		call missing(list);
		do i = 1 to dim(vars);
			call catx('|', list, trim(vars[i]));
		end;
		if k[id] ^= list then output;
drop list i j;	
run;
proc print data = want;
run;

View solution in original post


All Replies
Contributor
Posts: 33

Difference between Two dataset variable values

 

Hi,

I have requirement to compare two monthly extracts which has same varibales, but values may be changed between two datasets. I need to extract observations that has different value in nay varibale compare to last month dataset. For example below:

 

DATA SasConf;

INFILE DATALINES;

INPUT

ID

ConfName $

ConfYear

ConfCity $

ConfST $ ;

DATALINES4;

1 SUGI 2006 San Francisco CA

2 PHARMASUG 2006 Bonita Springs FL

3 NESUG 2006 newyork NY

4 WUSS 2006 Irvine CA

5 SESUG 2006 Atlanta GA

6 SCSUG 2006 Irving TX

7 MWSUG 2006 Dearborn MI

8 SUGI 2007 Orlando FL;

 

DATA SasConf2;

INFILE DATALINES;

INPUT  ID

ConfName $

ConfYear

ConfCity $

ConfST $ ;

DATALINES4;

 

1 SUGI 2006 San Francisco CA

2 PHARMASUG 2006 Bonita Springs FL

3 NESUG 2006 Philadelphia PA

4 WUSS 2006 Irvine CA

5 SESUG 2006 Atlanta GA

6 SCSUG 2006 Irving TX

7 PNWSUG 2006 Seaside OR

8 NESUGI 2007 Orlando FL

 

Compare these two datasets and the output datset should have following obervations:

3 NESUG 2006 Philadelphia PA

8 NESUGI 2007 Orlando FL

 

Note: there is primary key (ID) in the dataset, so any varibale value can be different, so need to select the observation which is changed from the last month .

 

is the way that we can do this on the data step (dont want to use proc compare).

 

Thank you for all your help.

Frequent Contributor
Posts: 75

Re: Difference between Two dataset variable values

Your data read doesn't work correctly as two-word city names end up in different columns but the result you're looking for can be obtained with a proc compare like this:

proc compare base=sasconf compare=sasconf2 out=comp noprint outcompare outnoequal;
by id;
run;

 

 

PROC Star
Posts: 1,760

Re: Difference between Two dataset variable values

Like this?


proc sql;
  select * from T2
  except
  select * from T1 ;
quit;

 

Solution
‎02-22-2017 01:00 PM
Super Contributor
Posts: 298

Re: Difference between Two dataset variable values

A Data Step Approach.

 

For each ID, concatenate all variables into a LIST, for this purpose, confYear can  be read as a string. Do this with one data set. Then read the other data set and make a concatenated LIST for each observation. For the given ID, check the LIST obtained in the previous data set. If the two are not equal, then output the record. I use a string array to hold the LIST, the array is referenced by the ID as Index of the array.

The progam can me made dynamic by getting the number of IDs and the total string-lengths of all variables prior to the data step and fill these in the appropriate statements.

I am getting records 3, 7 and 8 against the reported 3 and 8. If I have missed something, let me know.

DATA SasConf;
INFILE DATALINES;
INPUT ID ConfName :$9. ConfYear $4. ConfCity &$15. ConfST $ ;
DATALINES4;
1 SUGI 2006   San Francisco   CA
2 PHARMASUG 2006   Bonita Springs   FL
3 NESUG 2006   newyork    NY
4 WUSS 2006   Irvine    CA
5 SESUG 2006   Atlanta    GA
6 SCSUG 2006   Irving    TX
7 MWSUG 2006   Dearborn    MI
8 SUGI 2007   Orlando    FL
;;;;
run;


DATA SasConf2;
INFILE DATALINES;
INPUT  ID ConfName :$9. ConfYear $4. ConfCity &$15. ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco   CA
2 PHARMASUG 2006 Bonita Springs   FL
3 NESUG 2006 Philadelphia   PA
4 WUSS 2006 Irvine   CA
5 SESUG 2006 Atlanta   GA
6 SCSUG 2006 Irving   TX
7 PNWSUG 2006 Seaside   OR
8 NESUGI 2007 Orlando   FL
;;;;
run;


data want;
	if _n_ = 1 then do;
		array k[8] $35 _temporary_ ;
		length list $35;
		do j = 1 by 1 until(eof);
			set sasconf2 end = eof;
			array vars ConfName -- ConfST;
			do i = 1 to dim(vars);
				call catx('|', list, trim(vars[i]));
			end;
			k[j] = list;
			call missing(list);
		end;
	end;
	set sasconf;
		call missing(list);
		do i = 1 to dim(vars);
			call catx('|', list, trim(vars[i]));
		end;
		if k[id] ^= list then output;
drop list i j;	
run;
proc print data = want;
run;
Contributor
Posts: 33

Re: Difference between Two dataset variable values

Thank You ! This code works great.

 

Is there any way that we can identify which variable values has changed in a row when compared to the last month dataset ?

 

Thanks for your help.

Super Contributor
Posts: 298

Re: Difference between Two dataset variable values

if k[id] ^= list then output;

Do you want the first mismatch or all mismatches? Well I will sketch the way to do it. You may try and be proud to have done it.

The above statement is the place you have to start with as:

 

if k[id] ^= list then do;

*do your work here.

.......

end;

 

At present you have VAR[*], an array holding values for the the variables from one data set and K[ID] holds the combined list values for variables of the other data set. Using VAR[*], take each variable and see whether it matches in K[ID]. If it is not, then it is the first mismatched variable. If you need all unmatched, try with next variable from VAR[*] and on. You may use WHICHC() function for finding the match or no match.

Frequent Contributor
Posts: 75

Re: Difference between Two dataset variable values

Proc compare can do that for you without any complicated coding. Check the outnoequal, outdif, outbase and outcompare options out:

http://support.sas.com/documentation/cdl/en/proc/69850/HTML/default/viewer.htm#n1f136xdchlfrtn1663tm...

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 7 replies
  • 254 views
  • 1 like
  • 4 in conversation