Hi,
I have requirement to compare two monthly extracts which has same varibales, but values may be changed between two datasets. I need to extract an observation that has different value in nay variable compare to last month dataset. For example below:
DATA SasConf;
INFILE DATALINES;
INPUT
ID
ConfName $
ConfYear
ConfCity $
ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco CA
2 PHARMASUG 2006 Bonita Springs FL
3 NESUG 2006 newyork NY
4 WUSS 2006 Irvine CA
5 SESUG 2006 Atlanta GA
6 SCSUG 2006 Irving TX
7 MWSUG 2006 Dearborn MI
8 SUGI 2007 Orlando FL
;;;;
DATA SasConf2;
INFILE DATALINES;
INPUT ID
ConfName $
ConfYear
ConfCity $
ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco CA
2 PHARMASUG 2006 Bonita Springs FL
3 NESUG 2006 Philadelphia PA
4 WUSS 2006 Irvine CA
5 SESUG 2006 Atlanta GA
6 SCSUG 2006 Irving TX
7 PNWSUG 2006 Seaside OR
8 NESUGI 2007 Orlando FL
;;;;
Compare these two datasets and the output datset should have following obervations:
3 NESUG 2006 Philadelphia PA
8 NESUGI 2007 Orlando FL
Note: there is primary key (ID) in the dataset, so any variable value can be different, so need to select the observation which is changed from the last month.
is the way that we can do this on the data step (don’t want to use proc compare).
Thank you for all your help.
A Data Step Approach.
For each ID, concatenate all variables into a LIST, for this purpose, confYear can be read as a string. Do this with one data set. Then read the other data set and make a concatenated LIST for each observation. For the given ID, check the LIST obtained in the previous data set. If the two are not equal, then output the record. I use a string array to hold the LIST, the array is referenced by the ID as Index of the array.
The progam can me made dynamic by getting the number of IDs and the total string-lengths of all variables prior to the data step and fill these in the appropriate statements.
I am getting records 3, 7 and 8 against the reported 3 and 8. If I have missed something, let me know.
DATA SasConf;
INFILE DATALINES;
INPUT ID ConfName :$9. ConfYear $4. ConfCity &$15. ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco CA
2 PHARMASUG 2006 Bonita Springs FL
3 NESUG 2006 newyork NY
4 WUSS 2006 Irvine CA
5 SESUG 2006 Atlanta GA
6 SCSUG 2006 Irving TX
7 MWSUG 2006 Dearborn MI
8 SUGI 2007 Orlando FL
;;;;
run;
DATA SasConf2;
INFILE DATALINES;
INPUT ID ConfName :$9. ConfYear $4. ConfCity &$15. ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco CA
2 PHARMASUG 2006 Bonita Springs FL
3 NESUG 2006 Philadelphia PA
4 WUSS 2006 Irvine CA
5 SESUG 2006 Atlanta GA
6 SCSUG 2006 Irving TX
7 PNWSUG 2006 Seaside OR
8 NESUGI 2007 Orlando FL
;;;;
run;
data want;
if _n_ = 1 then do;
array k[8] $35 _temporary_ ;
length list $35;
do j = 1 by 1 until(eof);
set sasconf2 end = eof;
array vars ConfName -- ConfST;
do i = 1 to dim(vars);
call catx('|', list, trim(vars[i]));
end;
k[j] = list;
call missing(list);
end;
end;
set sasconf;
call missing(list);
do i = 1 to dim(vars);
call catx('|', list, trim(vars[i]));
end;
if k[id] ^= list then output;
drop list i j;
run;
proc print data = want;
run;
Hi,
I have requirement to compare two monthly extracts which has same varibales, but values may be changed between two datasets. I need to extract observations that has different value in nay varibale compare to last month dataset. For example below:
DATA SasConf;
INFILE DATALINES;
INPUT
ID
ConfName $
ConfYear
ConfCity $
ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco CA
2 PHARMASUG 2006 Bonita Springs FL
3 NESUG 2006 newyork NY
4 WUSS 2006 Irvine CA
5 SESUG 2006 Atlanta GA
6 SCSUG 2006 Irving TX
7 MWSUG 2006 Dearborn MI
8 SUGI 2007 Orlando FL;
DATA SasConf2;
INFILE DATALINES;
INPUT ID
ConfName $
ConfYear
ConfCity $
ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco CA
2 PHARMASUG 2006 Bonita Springs FL
3 NESUG 2006 Philadelphia PA
4 WUSS 2006 Irvine CA
5 SESUG 2006 Atlanta GA
6 SCSUG 2006 Irving TX
7 PNWSUG 2006 Seaside OR
8 NESUGI 2007 Orlando FL
Compare these two datasets and the output datset should have following obervations:
3 NESUG 2006 Philadelphia PA
8 NESUGI 2007 Orlando FL
Note: there is primary key (ID) in the dataset, so any varibale value can be different, so need to select the observation which is changed from the last month .
is the way that we can do this on the data step (dont want to use proc compare).
Thank you for all your help.
Your data read doesn't work correctly as two-word city names end up in different columns but the result you're looking for can be obtained with a proc compare like this:
proc compare base=sasconf compare=sasconf2 out=comp noprint outcompare outnoequal;
by id;
run;
Like this?
proc sql;
select * from T2
except
select * from T1 ;
quit;
A Data Step Approach.
For each ID, concatenate all variables into a LIST, for this purpose, confYear can be read as a string. Do this with one data set. Then read the other data set and make a concatenated LIST for each observation. For the given ID, check the LIST obtained in the previous data set. If the two are not equal, then output the record. I use a string array to hold the LIST, the array is referenced by the ID as Index of the array.
The progam can me made dynamic by getting the number of IDs and the total string-lengths of all variables prior to the data step and fill these in the appropriate statements.
I am getting records 3, 7 and 8 against the reported 3 and 8. If I have missed something, let me know.
DATA SasConf;
INFILE DATALINES;
INPUT ID ConfName :$9. ConfYear $4. ConfCity &$15. ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco CA
2 PHARMASUG 2006 Bonita Springs FL
3 NESUG 2006 newyork NY
4 WUSS 2006 Irvine CA
5 SESUG 2006 Atlanta GA
6 SCSUG 2006 Irving TX
7 MWSUG 2006 Dearborn MI
8 SUGI 2007 Orlando FL
;;;;
run;
DATA SasConf2;
INFILE DATALINES;
INPUT ID ConfName :$9. ConfYear $4. ConfCity &$15. ConfST $ ;
DATALINES4;
1 SUGI 2006 San Francisco CA
2 PHARMASUG 2006 Bonita Springs FL
3 NESUG 2006 Philadelphia PA
4 WUSS 2006 Irvine CA
5 SESUG 2006 Atlanta GA
6 SCSUG 2006 Irving TX
7 PNWSUG 2006 Seaside OR
8 NESUGI 2007 Orlando FL
;;;;
run;
data want;
if _n_ = 1 then do;
array k[8] $35 _temporary_ ;
length list $35;
do j = 1 by 1 until(eof);
set sasconf2 end = eof;
array vars ConfName -- ConfST;
do i = 1 to dim(vars);
call catx('|', list, trim(vars[i]));
end;
k[j] = list;
call missing(list);
end;
end;
set sasconf;
call missing(list);
do i = 1 to dim(vars);
call catx('|', list, trim(vars[i]));
end;
if k[id] ^= list then output;
drop list i j;
run;
proc print data = want;
run;
Thank You ! This code works great.
Is there any way that we can identify which variable values has changed in a row when compared to the last month dataset ?
Thanks for your help.
if k[id] ^= list then output;
Do you want the first mismatch or all mismatches? Well I will sketch the way to do it. You may try and be proud to have done it.
The above statement is the place you have to start with as:
if k[id] ^= list then do;
*do your work here.
.......
end;
At present you have VAR[*], an array holding values for the the variables from one data set and K[ID] holds the combined list values for variables of the other data set. Using VAR[*], take each variable and see whether it matches in K[ID]. If it is not, then it is the first mismatched variable. If you need all unmatched, try with next variable from VAR[*] and on. You may use WHICHC() function for finding the match or no match.
Proc compare can do that for you without any complicated coding. Check the outnoequal, outdif, outbase and outcompare options out:
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.