I have a program that creates multiple data sets(which I will call lists), each dataset gets formatted and has unique values that work like record numbers. Each list I designate a list_name and format and sort accordingly. I then create an aggregate table where all formatted tables are stacked together (I will call this the load file) My question is I need to back this up against a master data set, which is essentially the initial/original load file of this data located in oracle. Every day I need to see what records need to be added or removed in a master data set for each list_name in the current data run, meaning I generate this load file frequently and bump it up against the mater dataset. I need to flag those records (to be removed, to be added) in the load file, and create control stats table that essentially tabulates how many records were added or removed for each list. Attached is an example of the data.
This is what the aggregate load file data generally looks like. (A) on excel attachment
Below are counts for each 20 something lists in the load file:(B) on excel attachment
Each list value is unique for that list_name, but a list_value can have multiples (belong to more than one list). To reiterate, I run for today's date, time stamp the data, run for each list name, then stack them all together. The goal is to take that stacked formatted load file and back it up against the master list, which shows the list_name, list_values for the prior run, bump current run, flag what records in the master list need to be added or removed. I need to create flags in the 'load file' to communicate with another group that manages the master list in oracle, so alternatives to this methodology aren't helpful.
if I have access to the master file (e.g. sasdata.Master), how do I check the load file against sasdata.Master and flag which records are to be added or removed for each list in that load file so the folks managing sasdata.Master can update?
Thank you in advance for your time and help!
Rather than an explanation like that it's best to show what you have, and the output you are looking for:
Data have;
input var1;
cards;
.
.
.
;run;
Data want;
input var1;
cards;
.
.
.
;run;
thank you, below:
data load;
set name_1 name_2, name_3 etc;
run;
list_value (unique char string) list_name timestamp
345335 name_1
345666 name_1
644466 name_2
433644 name_3
(step to look up master file here?)
list_value (unique char string) list_name timestamp add remove
345335 name_1 Y N
345666 name_1 N Y
644466 name_2 Y N
433644 name_3 N N
then count the add/removes for each list
also, the master file I need to back up against is the same format.
I'm not sure what it is you need, maybe this will help set you in the right direction:
data have1;
infile cards dsd;
input id var:$35.;
cards;
1,Steelers Are Good At Football
2,Bengals are not good at football
;run;
data have2;
infile cards dsd;
input id var:$35.;
cards;
1,Steelers Are Good At Football
3,NewVar
;run;
data want;
merge have1(in=a)
have2(in=b);
by id;
if a and b then NewVar = 'Same';
if a and not b then newvar = 'Drop';
if b and not a then newvar = 'Add';
run;
Go Steelers,
Mark
Thank you! The only outstanding question I have is the unique key values are list_value and list_name. So for instance:
File 1 File2
-------------- -----------------
ID NAME ID NAME
-------------- -------------------
2 name1 2 name1
3 name1 3 name4
3 name2 6 name5
4 name3 ..etc
the COMBO of ID and Name is unique. Could there be a do loop of this merge by list name in the merge statement you have above? Or would I have to break apart each list by list name and apply this merge (not favorable because there are some 20 list names, these would have to be done as 20 merge statements then combined).
You need a full join between the new table and the updated table based on id and list.
Then you can flag each record, based on your criteria using a case statement.
For code you'll need to provide better sample data - a sample of your master data, changed data and the output data.
I could make fake data, but then I'm making assumptions so it's better if you do....
What they're looking to do is an UPDATE in SQL though, so unless your tables are huge it may be easier to give them the new table that can then be used to automatically update the table. It's easier to code for them in the long run.
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.