Hi,
I have requirement to compare two monthly extracts which has same varibales, but values may be changed between two datasets. I need to extract observations that has different value in nay varibale compare to last month dataset. For example below:
DATA SasConf;
INFILE DATALINES;
INPUT ConfName $
ConfYear
ConfCity $
ConfST $ ;
DATALINES4;
SUGI 2006 San Francisco CA
PHARMASUG 2006 Bonita Springs FL
NESUG 2006 newyork NY
WUSS 2006 Irvine CA
SESUG 2006 Atlanta GA
SCSUG 2006 Irving TX
MWSUG 2006 Dearborn MI
SUGI 2007 Orlando FL
;;;;
DATA SasConf2;
INFILE DATALINES;
INPUT ConfName $
ConfYear
ConfCity $
ConfST $ ;
DATALINES4;
SUGI 2006 San Francisco CA
PHARMASUG 2006 Bonita Springs FL
NESUG 2006 Philadelphia PA
WUSS 2006 Irvine CA
SESUG 2006 Atlanta GA
SCSUG 2006 Irving TX
PNWSUG 2006 Seaside OR
NESUGI 2007 Orlando FL
;;;;
Compare these two datasets and the output datset should have following obervations:
NESUG 2006 Philadelphia PA
NESUGI 2007 Orlando FL
Note: there is no primary key in the dataset, so any varibale value can be different, so need to select the observation which is changed from last month and also ignore any new observation in the current month dataset.
Thank you for all your help.
I don't see how this will be possible, if there is no key in the datasets. You say you want to select observations with changed values, and ignore new observations. If there is no key, how will you tell the difference between a changed observation and a new observation?
If what you want are records that are not completely matched in the two datasets, you could do the following program.
It only assumes that both dataset A and B have exactly the same variables.
It will create two datasets: a_only and b_only. But it does NOT tell you which b_only record is most likely to be associated with an a_only record.
data a_only (drop=rc);
set a end=end_of_a;
if _n_=1 then do;
declare hash b (dataset:'b');
b.definekey(all:'Y');
b.definedata(all:'Y');
b.definedone();
end;
rc=b.find();
if rc=0 then rc=b.remove(); /* It's a match: don't output A and remove from hash b*/
else output a_only;
if end_of_a then rc=b.output(dataset:'bonly');
run;
Edit note: Also I think there is an "except" operator in proc sql that could provide the unmatched records.
That is very complicated problem. You said there are not key variables, how can you measure the similarity between these two string. Why would you select NESUGI 2007 Orlando FL I think it is a new obs ,since it isn't included in the first table.
Even so, it probably makes a lot of sense to first eliminate exactly matched records.
Then, with the (hopefully) small set of unmatched records, do some sort of spelling distance or general editting distance between them. Or one could count the number/proportion of common words in every paired comparison.
Of course, this presumes that the op only expects 1 to 1 matches.
I would have started with
proc sql; create table want as select * from sasconf2 except select * from sasconf ; quit;
to show me everything in the second table not in the first. Which shows new records as well as changes.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.