BookmarkSubscribeRSS Feed
agrocrag14
Fluorite | Level 6

I have a program that creates multiple data sets(which I will call lists), each dataset gets formatted and has unique values that work like record numbers. Each list I designate a list_name and format and sort accordingly. I then create an aggregate table where all formatted tables are stacked together (I will call this the load file) My question is I need to back this up against a master data set, which is essentially the initial/original load file of this data located in oracle. Every day I need to see what records need to be added or removed in a master data set for each list_name in the current data run, meaning I generate this load file frequently and bump it up against the mater dataset. I need to flag those records (to be removed, to be added) in the load file, and create control stats table that essentially tabulates how many records were added or removed for each list.  Attached  is an example of the data.

 

 This is what the aggregate load file data generally looks like. (A)  on excel attachment

 

Below are counts for each 20 something lists in the load file:(B) on excel attachment

 

Each list value is unique for that list_name,  but a list_value can have multiples (belong to more than one list). To reiterate, I run for today's date, time stamp the data, run for each list name, then stack them all together. The goal is to take that stacked formatted load file and back it up against the master list, which shows the list_name, list_values for the prior run, bump current run, flag what records in the master list need to be added or removed. I  need to create flags in the 'load file' to communicate with another group that manages the master list in oracle, so alternatives to this methodology aren't helpful.

 

if I have access to the master file (e.g. sasdata.Master), how do I check the load file against sasdata.Master and flag which records are to be added or removed for each list in that load file so the folks managing sasdata.Master can update?

 

 

Thank you in advance for your time and help!

7 REPLIES 7
Steelers_In_DC
Barite | Level 11

Rather than an explanation like that it's best to show what you have, and the output you are looking for:

 

Data have;

input var1;

cards;

.

.

.

;run;

 

Data want;

input var1;

cards;

.

.

.

;run;

agrocrag14
Fluorite | Level 6

thank you, below:

data load;

set name_1 name_2, name_3 etc;

run;

 

list_value (unique char string)    list_name          timestamp

345335                                       name_1

345666                                       name_1

644466                                       name_2

433644                                       name_3

 

(step to look up master file here?)

 

list_value (unique char string)    list_name          timestamp    add    remove

345335                                       name_1                                 Y        N

345666                                       name_1                                 N        Y

644466                                       name_2                                 Y        N

433644                                       name_3                                 N        N

 

then count the add/removes for each list

agrocrag14
Fluorite | Level 6

also, the master file I need to back up against is the same format.

Steelers_In_DC
Barite | Level 11

I'm not sure what it is you need, maybe this will help set you in the right direction:

 

data have1;
infile cards dsd;
input id var:$35.;
cards;
1,Steelers Are Good At Football
2,Bengals are not good at football
;run;

data have2;
infile cards dsd;
input id var:$35.;
cards;
1,Steelers Are Good At Football
3,NewVar
;run;

data want;
merge have1(in=a)
      have2(in=b);
by id;
if a and b then NewVar = 'Same';
if a and not b then newvar = 'Drop';
if b and not a then newvar = 'Add';
run;

 

Go Steelers,

 

Mark

agrocrag14
Fluorite | Level 6

Thank you! The only outstanding question I have is the unique key values are list_value and list_name. So for instance:

 

      File 1                File2

--------------       -----------------

ID    NAME     ID    NAME

--------------      -------------------

2     name1     2      name1        

3     name1     3      name4  

3     name2     6      name5

4     name3     ..etc

 

the COMBO of ID and Name is unique. Could there be a do loop of this merge by list name in the merge statement you have above? Or would I have to break apart each list by list name and apply this merge (not favorable because there are some 20 list names, these would have to be done as 20 merge statements then combined).

Reeza
Super User

You need a full join between the new table and the updated table based on id and list. 

 

Then you can flag each record, based on your criteria using a case statement. 

 

For code you'll need to provide better sample data - a sample of your master data, changed data and the output data.

I could make fake data, but then I'm making assumptions so it's better if you do....

 

What they're looking to do is an UPDATE in SQL though, so unless your tables are huge it may be easier to give them the new table that can then be used to automatically update the table. It's easier to code for them in the long run. 

agrocrag14
Fluorite | Level 6
What would a full join look like? It looks like a match merge is getting me close to what I need to accomplish but I need a combo of the two variables as a key value can appear more than once (belong to 2 list_names).

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1861 views
  • 0 likes
  • 3 in conversation