BookmarkSubscribeRSS Feed
SmcGarrett
Obsidian | Level 7

Hello all,

 

I apologize because I know my question may be vague but I simply do not know what the words are that describe what I am wanting to do. 

 

Here's the situation:

 

I am receiving about 6 different data sets that need to be filtered and standardized for a statistical process. Some data is received as a text file, some as an excel file...etc. 

 

Some data sets have more information (columns) than the others and some have less. 

 

Depending on what is given in these data sets, I am filtering and manipulating the data until I get it into a standard form.... 

 

Example: 

 

Data set 1: has the time of Visits and orders but no CID, therefore, we cannot determine which visitor ordered and how long it took for them to order. 

 

This data set would simply list the amount of visits per hour and the amount of orders per hour.

 

Data set 2: has the time of Visits and orders AND a CID, therefore, we CAN determine which visitor ordered and how long it took for them to order.

 

This data set would have the amount of visits per hour and the amount of orders per hour with the orders being sourced back to the hour at which the customer initially visited. 

 

At the end of the day, the data sets will look exactly alike once I've compiled the numbers by hour. 

 

So what I want to do is essentially write a program where I can use a True/False in order to do additional or different steps. 

 

If dataset contains CIDs THEN do:

Step A

Step B

Step C

Step D

 

if dataset does not contain CIDs THEN do:

Step E

Step F 

Step G

Step H

 

After step D or H then do:

Step I

Step J

Step K

 

Any help on this? 

 

 

 

7 REPLIES 7
TomKari
Onyx | Level 15

That's a pretty sparse description of your requirement! First question; are you trying to do this all with the tasks in Enterprise Guide, or are you comfortable writing some SAS code?

 

SmcGarrett
Obsidian | Level 7

I am unfortunately forced to work in enterprise guide but I only use the coding feature. To be honest, I don't even know how to do the click features. So yes, I am comfortable writing code. 

 

The description is simple.... 

 

There are X amounts of initial formats for data that I am processing. Currently, I have a different code for each of the types of data sets. Instead of having X amounts codes, I would rather have one. That way if there are 5 different types of programs/codes that are filtering and formatting the data, I can only call ONE program and indicate which program it should run based on a macro call. 

 

So lets say I write a macro program which I am very comfortable with and it is %format_converter.

 

And I have 3 different Data set formats 

 

Data Format 1:

Text Format

Pacific Time Zone 

No Customer IDs

Visitors

 

Data Format 2:

Excel Format

Eastern Time Zone

Customer IDS

Visitors

Orders 

 

Data Format 3:

Text Format

Eastern Time Zone

No Customer IDs

Visitors

Orders 

 

The final format of all of these files should be:

 

Final Format:

CSV Format

Eastern Time Zone

Visitors Per Minute

Orders Per Minute

 

 

%macro format_converter(

File,

Start_Date,

End_Date,

Data_Format Type)

/ store source

 

There will be 3 choices for Data_Format Type and it will dictate which macro is called to format the data set.

 

Data set 1:

Needs to be read in with infile statement

Needs to be converted to Eastern Time

No CIDs and no orders so very simple, just sum visits by minute.

Orders column will be 0

FINISHED

Data set 2:

Needs to be read in with proc import

Already in Eastern Time so no conversion necessary.

CID’s on the visit and order data track which customers bought what.

If visit comes in a 3:05pm and the order comes in at 3:56 pm, the order will be sourced back to 3:05

Visits and SOURCED orders compiled by minute.

FINISHED

Data set 3:

Needs to be read in with infile statement

Already in Eastern Time so no conversion necessary

There are orders and visits but no CIDs

Visits and UNSOURCED orders are compiled by minute

Finished

 

 

So as you can see the data is similar for a lot of the formats but since they differ slightly they need different formatting and prepping for analysis.

 

Currently, I have ALL of these codes created and am using them daily in order to manage the files that are coming in. Instead of having all these codes that are doing ALMOST the same thing except for minor differences, I would like to create ONE code, %format, and then essentially say:

If this format then follow these instructions

If that format then follow those instructions

….

 

 

Reeza
Super User

It sounds like basic if/then logic to me. 

 

I like to start by defining my macro call ie, what do you want to provide to the macro and what do you want to automate. 

For example should the macro determine file type or will you provide that?

whats common to all, what's unique to each? 

 

Then start creating a skeleton code that does the if/then and make sure it goes into each section correctly. 

 

This is hard to explain...I'll mock up an example in a bit. 

SmcGarrett
Obsidian | Level 7

If I have 3 macro codes for 3 different types of data sets:

 

%Converter1(File, Start_Date)

%Converter2(File, Start_Date)

%Converter3(File, Start_Date)

 

Then I would want my master macro code to be something like:

 

%Formatter(File, Start_Date, Converter_Type).

 

So that if I were to write:

 

%Formatter(//FILELOCATION.xlsx, 2AUG2016, Converter1)

 

It would run %Converter1.

 

So then my %Formatter Code would look something like

 

%Formatter(File, Start_Date, Converter_Type) / store source;

 

if Converter_Type = &Converter_Type, then %Converter1(&File, &Start_Date)

if Converter_Type = &Converter_Type , then %Converter2(&File, &Start_Date)

if Converter_Type = &Converter_Type , then %Converter3(&File, &Start_Date)

 

Does that make sense? 

 

 

 

Reeza
Super User
/*Switch from test to full macro call*/
%macro formatter(file_type, file, start_date);

%if &file_type=1 %then %do;
    %converter1(&file, &start_date) ;
%end;

%else %if &file_type=2 %then %do;
    %converter2(&file, &start_date);
%end;

%else %if &file_type=3 %then %do;
    %converter2(&file, &start_date);
%end;

%else %do;
    %put Invalid Entry;
%end;

%*Any other code goes afterwards;

%mend;
ballardw
Super User

If your file names are distinctive enough you might attempt to parse the names to use an appropriate If as in @Reeza's example. However you may spend a lot of time maintaining that as the symptom of multiple file types indicates this process may be a tad fluid going into the future.

 

My projects with somewhat similar behavior have data files arriving on roughly monthly intervals (not regular at that) so I have individual programs to read each instead of trying to shoehorn them into a single program. I then combine as needed.

 

 

RW9
Diamond | Level 26 RW9
Diamond | Level 26

I agree with @ballardw here.  Whilst technically it is feasible - you could use macro, or a data _null_ off the sashelp.vcolum dataset to generate the code, it is the process which is the problem here.  Do you import agreements setup for the files - if not then your process is on very shaky ground to start with, how do you know when files change, what happens if the structure changes etc.  Keyword here is documentation, if I need to get a file from someone I will spend 99% of my time for that taks ensuring an agreement is in place detailing the file(s) required, the structure of each, shipping times etc.  This it the basis of any process going forward.

 

Once you have that in place, the rest of it is simple, create you import programs - can import multiple files using same bits of code, apply any needed processing, done.  All my data nicely processed into the format I need using the agreed upon process.  Any change to the data - well, thats not in the agreement and a new one needs to take place (good for discussing budgets and such like).

 

If you don't do that, next transfer they could just send you a picture of the data, or maybe just print it off and post it to you.  Control is everything, documentation is everything. 

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

SAS Enterprise Guide vs. SAS Studio

What’s the difference between SAS Enterprise Guide and SAS Studio? How are they similar? Just ask SAS’ Danny Modlin.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1377 views
  • 4 likes
  • 5 in conversation