BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
cho16
Obsidian | Level 7

Hi,

I have requirement to compare two monthly extracts which has same varibales, but values may be changed between two datasets. I need to extract an observation that has different value in nay variable compare to last month dataset. For example below:

 

 

 DATA SasConf;
INFILE DATALINES;
INPUT
ID
ConfName $
ConfYear
ConfCity $
ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco CA
2 PHARMASUG 2006 Bonita Springs FL
3 NESUG 2006 newyork NY
4 WUSS 2006 Irvine CA
5 SESUG 2006 Atlanta GA
6 SCSUG 2006 Irving TX
7 MWSUG 2006 Dearborn MI
8 SUGI 2007 Orlando FL
;;;;
 
DATA SasConf2;
INFILE DATALINES;
INPUT  ID
ConfName $
ConfYear
ConfCity $
ConfST $ ;
DATALINES4;
 1 SUGI 2006 San Francisco CA
2 PHARMASUG 2006 Bonita Springs FL
3 NESUG 2006 Philadelphia PA
4 WUSS 2006 Irvine CA
5 SESUG 2006 Atlanta GA
6 SCSUG 2006 Irving TX
7 PNWSUG 2006 Seaside OR
8 NESUGI 2007 Orlando FL
;;;;

 

Compare these two datasets and the output datset should have following obervations:

3 NESUG 2006 Philadelphia PA

8 NESUGI 2007 Orlando FL

 

Note: there is primary key (ID) in the dataset, so any variable value can be different, so need to select the observation which is changed from the last month.

 is the way that we can do this on the data step (don’t want to use proc compare).

 

Thank you for all your help.

 

1 ACCEPTED SOLUTION

Accepted Solutions
KachiM
Rhodochrosite | Level 12

A Data Step Approach.

 

For each ID, concatenate all variables into a LIST, for this purpose, confYear can  be read as a string. Do this with one data set. Then read the other data set and make a concatenated LIST for each observation. For the given ID, check the LIST obtained in the previous data set. If the two are not equal, then output the record. I use a string array to hold the LIST, the array is referenced by the ID as Index of the array.

The progam can me made dynamic by getting the number of IDs and the total string-lengths of all variables prior to the data step and fill these in the appropriate statements.

I am getting records 3, 7 and 8 against the reported 3 and 8. If I have missed something, let me know.

DATA SasConf;
INFILE DATALINES;
INPUT ID ConfName :$9. ConfYear $4. ConfCity &$15. ConfST $ ;
DATALINES4;
1 SUGI 2006   San Francisco   CA
2 PHARMASUG 2006   Bonita Springs   FL
3 NESUG 2006   newyork    NY
4 WUSS 2006   Irvine    CA
5 SESUG 2006   Atlanta    GA
6 SCSUG 2006   Irving    TX
7 MWSUG 2006   Dearborn    MI
8 SUGI 2007   Orlando    FL
;;;;
run;


DATA SasConf2;
INFILE DATALINES;
INPUT  ID ConfName :$9. ConfYear $4. ConfCity &$15. ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco   CA
2 PHARMASUG 2006 Bonita Springs   FL
3 NESUG 2006 Philadelphia   PA
4 WUSS 2006 Irvine   CA
5 SESUG 2006 Atlanta   GA
6 SCSUG 2006 Irving   TX
7 PNWSUG 2006 Seaside   OR
8 NESUGI 2007 Orlando   FL
;;;;
run;


data want;
	if _n_ = 1 then do;
		array k[8] $35 _temporary_ ;
		length list $35;
		do j = 1 by 1 until(eof);
			set sasconf2 end = eof;
			array vars ConfName -- ConfST;
			do i = 1 to dim(vars);
				call catx('|', list, trim(vars[i]));
			end;
			k[j] = list;
			call missing(list);
		end;
	end;
	set sasconf;
		call missing(list);
		do i = 1 to dim(vars);
			call catx('|', list, trim(vars[i]));
		end;
		if k[id] ^= list then output;
drop list i j;	
run;
proc print data = want;
run;

View solution in original post

7 REPLIES 7
cho16
Obsidian | Level 7

 

Hi,

I have requirement to compare two monthly extracts which has same varibales, but values may be changed between two datasets. I need to extract observations that has different value in nay varibale compare to last month dataset. For example below:

 

DATA SasConf;

INFILE DATALINES;

INPUT

ID

ConfName $

ConfYear

ConfCity $

ConfST $ ;

DATALINES4;

1 SUGI 2006 San Francisco CA

2 PHARMASUG 2006 Bonita Springs FL

3 NESUG 2006 newyork NY

4 WUSS 2006 Irvine CA

5 SESUG 2006 Atlanta GA

6 SCSUG 2006 Irving TX

7 MWSUG 2006 Dearborn MI

8 SUGI 2007 Orlando FL;

 

DATA SasConf2;

INFILE DATALINES;

INPUT  ID

ConfName $

ConfYear

ConfCity $

ConfST $ ;

DATALINES4;

 

1 SUGI 2006 San Francisco CA

2 PHARMASUG 2006 Bonita Springs FL

3 NESUG 2006 Philadelphia PA

4 WUSS 2006 Irvine CA

5 SESUG 2006 Atlanta GA

6 SCSUG 2006 Irving TX

7 PNWSUG 2006 Seaside OR

8 NESUGI 2007 Orlando FL

 

Compare these two datasets and the output datset should have following obervations:

3 NESUG 2006 Philadelphia PA

8 NESUGI 2007 Orlando FL

 

Note: there is primary key (ID) in the dataset, so any varibale value can be different, so need to select the observation which is changed from the last month .

 

is the way that we can do this on the data step (dont want to use proc compare).

 

Thank you for all your help.

nehalsanghvi
Pyrite | Level 9

Your data read doesn't work correctly as two-word city names end up in different columns but the result you're looking for can be obtained with a proc compare like this:

proc compare base=sasconf compare=sasconf2 out=comp noprint outcompare outnoequal;
by id;
run;

 

 

ChrisNZ
Tourmaline | Level 20

Like this?


proc sql;
  select * from T2
  except
  select * from T1 ;
quit;

 

KachiM
Rhodochrosite | Level 12

A Data Step Approach.

 

For each ID, concatenate all variables into a LIST, for this purpose, confYear can  be read as a string. Do this with one data set. Then read the other data set and make a concatenated LIST for each observation. For the given ID, check the LIST obtained in the previous data set. If the two are not equal, then output the record. I use a string array to hold the LIST, the array is referenced by the ID as Index of the array.

The progam can me made dynamic by getting the number of IDs and the total string-lengths of all variables prior to the data step and fill these in the appropriate statements.

I am getting records 3, 7 and 8 against the reported 3 and 8. If I have missed something, let me know.

DATA SasConf;
INFILE DATALINES;
INPUT ID ConfName :$9. ConfYear $4. ConfCity &$15. ConfST $ ;
DATALINES4;
1 SUGI 2006   San Francisco   CA
2 PHARMASUG 2006   Bonita Springs   FL
3 NESUG 2006   newyork    NY
4 WUSS 2006   Irvine    CA
5 SESUG 2006   Atlanta    GA
6 SCSUG 2006   Irving    TX
7 MWSUG 2006   Dearborn    MI
8 SUGI 2007   Orlando    FL
;;;;
run;


DATA SasConf2;
INFILE DATALINES;
INPUT  ID ConfName :$9. ConfYear $4. ConfCity &$15. ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco   CA
2 PHARMASUG 2006 Bonita Springs   FL
3 NESUG 2006 Philadelphia   PA
4 WUSS 2006 Irvine   CA
5 SESUG 2006 Atlanta   GA
6 SCSUG 2006 Irving   TX
7 PNWSUG 2006 Seaside   OR
8 NESUGI 2007 Orlando   FL
;;;;
run;


data want;
	if _n_ = 1 then do;
		array k[8] $35 _temporary_ ;
		length list $35;
		do j = 1 by 1 until(eof);
			set sasconf2 end = eof;
			array vars ConfName -- ConfST;
			do i = 1 to dim(vars);
				call catx('|', list, trim(vars[i]));
			end;
			k[j] = list;
			call missing(list);
		end;
	end;
	set sasconf;
		call missing(list);
		do i = 1 to dim(vars);
			call catx('|', list, trim(vars[i]));
		end;
		if k[id] ^= list then output;
drop list i j;	
run;
proc print data = want;
run;
cho16
Obsidian | Level 7

Thank You ! This code works great.

 

Is there any way that we can identify which variable values has changed in a row when compared to the last month dataset ?

 

Thanks for your help.

KachiM
Rhodochrosite | Level 12
if k[id] ^= list then output;

Do you want the first mismatch or all mismatches? Well I will sketch the way to do it. You may try and be proud to have done it.

The above statement is the place you have to start with as:

 

if k[id] ^= list then do;

*do your work here.

.......

end;

 

At present you have VAR[*], an array holding values for the the variables from one data set and K[ID] holds the combined list values for variables of the other data set. Using VAR[*], take each variable and see whether it matches in K[ID]. If it is not, then it is the first mismatched variable. If you need all unmatched, try with next variable from VAR[*] and on. You may use WHICHC() function for finding the match or no match.

nehalsanghvi
Pyrite | Level 9

Proc compare can do that for you without any complicated coding. Check the outnoequal, outdif, outbase and outcompare options out:

http://support.sas.com/documentation/cdl/en/proc/69850/HTML/default/viewer.htm#n1f136xdchlfrtn1663tm...

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 7 replies
  • 2186 views
  • 1 like
  • 4 in conversation