BookmarkSubscribeRSS Feed
bignate1030
Fluorite | Level 6

I have three types of tables, one that contains a list of events (event_data), one that holds a list of aggregation codes (value_translation), and one that shows which aggregation table each event corresponds to (variable_translation). (All stored in an Oracle database, thus the Proc Sql). I would like to have some dummy variable that I can say var1=event1, and then I already have a ready-made program using the dummy variable that will extract all the data for that single event (from event_data), figure out the aggregation table that is needed (from variable translation) and then extract that aggregation table (from value_translation). For step 2, the aggregation table name is stored in a variable named order_group. This example code will be wrong but hopefully it will help get the point across. 

 

Step 1:

dum_var = eventA;

 

proc sql;

create table eventA as

select * from cit_udm.event_data where event = dum_var;

 

(cit_udm is the user in oracle)

 

Step 2:

/** what i'm really lost on **/

look at variable order_group in table variable_translation and find the value that corresponds to eventA, then save that value in another dummy variable dum_var2, for use in step 3.

/*****************************/

 

Step 3: 

create table dum_var2 as 

select * from cit_ct_stg.value_translation

where table = dum_var2;

 

(cit_ct_stg is a different user in oracle)

 

Thank you in advance!

11 REPLIES 11
bignate1030
Fluorite | Level 6

I am relatively new to SAS so I haven't come across formats yet, can you explain how they work?

Tom
Super User Tom
Super User

You need to provide some simple example data to make your point clearer.  Please post it using a simple data step to create the sample data so others can play with it.  Make sure to use the Insert Code or Insert SAS Code icons on the forum editor to get a pop-up window where you can paste/edit your code. That will prevent the forum editor from trying to format your code as if it was text paragraphs.

 

Sounds like you need to use some code generation.  One simple way is to use a macro variable to make your code dynamic.  The basic idea of a macro variable is that it is text that can be substituted. So you set a value into the macro variable and then reference it by using an & before its name and the generated result is then treated by SAS itself as the code to be run. 

 

Sounds like your problem is that based on the value of ??? in one table you can determine the NAME of the other table.  So let's assume that ORDER_GROUP is the variable in the first table that has the name of the table for the second step.  So let's pull it out into a macro variable named TABLENAME and then use &TABLENAME in the FROM clause of the next query.  If it is not exactly the table name then you can make the code include whatever logic is needed to transform the value into the table name.

 

proc sql noprint;
select order_group into :tablename trimmed 
from table_one 
;
create table want as 
select * from &tablename
;
quit;

 

 

 

bignate1030
Fluorite | Level 6

 

/* Table 1 */
data event_data;
    input id $ name $; 
    datalines;
124 event_a
632 event_b
734 event_c
;

/* Table 2 */
data variable_definition;
    input name $ agg_table $;
    datalines;
event_a agg_table_1
event_b agg_table_2
event_c agg_table_3
; 

/* Table 3 */
data name_of_chosen_agg_table;
    input agg_table $ agg_value $;
    datalines;
agg_table_1 3
agg_table_2 7
agg_table_3 9
;

So what I would like to have happen is I can set a dummy variable equal to event_a and I only extract event_a values from table 1. Then table 2 shows what agg_table corresponds to event_a, and from there table 3 is created with only values corresponding to agg_table_1. But then I'm done with event_a and now I want event_b without having to change every single instance of where event_b would show up. I can just change the stored value in the dummy variable. I also don't need table 2 to actually be produced, it just acts as a go-between for tables 1 and 3.   

 

** I apologize for the wonky bit at the end I can't figure out why it isn't formatting properly

Tom
Super User Tom
Super User

I don't see where "name_of_chosen_agg_table" comes from unless it is hard coded.

Is there some place to find that name?

 

Otherwise it looks like you just want to link event_data to variable_definition based on matching values of NAME and then link to name_of_chosen_agg_table based on values of AGG_TABLE.

 

Or are you saying that "name_of_chosen_agg_table" is the input and you want to follow the links in the other direction to find NAME="event a" and ID=124?

bignate1030
Fluorite | Level 6

name_of_chosen_agg_table corresponds to whatever event I am looking at. So in my program I write some code to extract data just corresponding to event_a, I want "name_of_chosen_agg_table" to be the corresponding agg_table for the chosen event (in this case agg_table_1). But in my program I would like to not have to hardcode this name, among other things, every time I want to look at different data. So I finish looking at all the data related to event_a, now I want to look at event_b. How can I write my code so that I only need to change a line or two, and everything else updates accordingly, if this is possible.

Tom
Super User Tom
Super User

Still clear as mud.  If you are READING form existing tables then I suspect that your actual table names look more like what @PGStats posted.  If you are creating new tables then explain what table name you want to create.

 

Please show one or two fully worked examples with actual table and variable names so that the pattern of the code you want to generate is is clearer.

bignate1030
Fluorite | Level 6

This code creates the event_data table

PROC SQL;
CREATE TABLE event_data AS 
  SELECT * FROM WORK.KVS_EVENT_DATA t1
  WHERE t1.KEY LIKE 'LYMPOST_DS_LYM_DIS_STAT_BY_CT%'
  ORDER BY PERSON_ID, MAIN_EVENT_ID, EVENT_ID
;
QUIT;

 

which looks like:

event_data.PNG

 

Where "Lympost_ds_lym_dis_stat_by_ct" is the general variable that I am interested in.

 

Then I look at the following table:

variable definition.PNG 

where I see that the aggregation_order_group for my desired variable is "CD_Lymstaps".

 

So I use this code 

CREATE TABLE CD_LYMSTAPS AS
SELECT * FROM CONNECTION TO dbconn
	(SELECT * FROM CIT_CT_STG.CT_VALUE_TRANSLATION 
		WHERE CDE_TABLE = 'CD_LYMSTAPS');
 
disconnect from dbconn;

To create the following table: 

Lymstaps.PNG

And I end up merging the two sas outputs for further analysis. I am looking for how to modify the code so that I am can specify any variable and the corresponding tables update accordingly. Going through this at first I just had to look at this one variable so for the variable_definition table (middle) I was able to just look up the aggregation_order_group value in Oracle and go from there. However I do not wish to look up that value for every variable I need to analyze as I will be going through hundreds of these variables in the near future. 

Tom
Super User Tom
Super User

Still a little long on photographs and short on names of tables and variables.

It looks like this is the killer step:

Then I look at the following table:

Can you explain more how that step works? How do you LOOK at it? How did you know that is the right dataset to look at? Is it the only dataset?  If not then what criteria do you use choose it?  I cannot see that photograph of your screen. Can you just provide the variable names as text?  What do you see if you run PROC CONTENTS on the dataset?  How did you pick  the CD_LYMSTAPS value?  Can you explain the rule?

 

You have provided a little bit of code we can use to demonstrate how the SAS macro processor can help you.

CREATE TABLE CD_LYMSTAPS AS
  SELECT * FROM CONNECTION TO dbconn
    (SELECT * FROM CIT_CT_STG.CT_VALUE_TRANSLATION 
     WHERE CDE_TABLE = 'CD_LYMSTAPS'
    )
;

So it looks like you want to use the string of characters CD_LYMSTAPS as both the name of a table to generate and as the value to use in a WHERE clause.   So normally you would just put that value into a macro variable and replace the text in the code with the reference to the macro variable.

%let varname=CD_LYMSTAPS;

CREATE TABLE &varname. AS
  SELECT * FROM CONNECTION TO dbconn
    (SELECT * FROM CIT_CT_STG.CT_VALUE_TRANSLATION 
     WHERE CDE_TABLE = "&varname."
    )
;

But the use of explicit pass thru SQL might cause a problem. The macro processor does not operate on macro triggers inside of single quotes, but many external database systems only allow single quotes around string literals.  One easy fix is to use %BQUOTE() function will will add macro quotes round the single quotes and allow the macro variable reference to be resolved.  So the where clause could be written like this instead.

     WHERE CDE_TABLE = %bquote('&varname.')

Don't worry about the macro quoting it will not be passed to the remote database, just the actual code.

 

If you can explain the rule used to select that particular value of CD_LYMSTAPS then perhaps we can figure out a way to set the macro variable via code.

bignate1030
Fluorite | Level 6

The middle table comes from the Oracle database that I am using, sorry I forgot to mention that. My task was to devise a program that can take information out of the database and transform it using SAS enterprise guide. My supervisor told me to look specifically at the 'lympost_ds_lym_dis_stat_ct' variable. I was able to complete this task. However the program is written specifically for that variable. Names of tables, joins, the transpose, etc. all relate directly to this variable. My supervisor has now asked me to write code that can do the same transformations for any variable. I could achieve this by re-hardcoding all the names (tables, joins, etc.) in the original program but I have hundreds of variables I potentially need to transform so re-hardcoding everything would be a waste of my time. I am trying to write a program where I can change the value stored in one macro variable and all the other places I would need to re-code have been replaced by the macro variable. Using the middle table (variable_definition) as a part of this process is where I am having problems. Here is what needs to happen:

All of my variables are stored in the event_data table.

My supervisor tells me to transform a specific variable (lympost_ds_lym_dis_stat_ct).

I filter the values in the event_data table so that I am left with just the desired variable as show here:

PROC SQL;
CREATE TABLE event_data AS 
  SELECT * FROM WORK.KVS_EVENT_DATA t1
  WHERE t1.KEY LIKE 'LYMPOST_DS_LYM_DIS_STAT_BY_CT%'
  ORDER BY PERSON_ID, MAIN_EVENT_ID, EVENT_ID
;
QUIT;

And the resulting table:

event_data.PNG

 

Now I need to join this table with an aggregation table. Since the original program is written for this one specific variable, I did not write code to find the aggregation table. I just went into my Oracle database and found the name of the table I needed through their point and click filter options, which is CD_Lymstaps. This is the table that gave me that information (variable_definition):

variable definition.PNG

So from this table it can be seen that for my variable (lympost_ds_lym_dis_stat_ct, second from left), the corresponding aggregation table is CD_Lymstaps (stored under aggregation_order_group, far right). From here, I extract this aggregation table into SAS EG from Oracle using this code:

 

CREATE TABLE CD_LYMSTAPS AS
SELECT * FROM CONNECTION TO dbconn
	(SELECT * FROM CIT_CT_STG.CT_VALUE_TRANSLATION 
		WHERE CDE_TABLE = 'CD_LYMSTAPS');
 
disconnect from dbconn;

Which results in this table (CD_Lymstaps):

Lymstaps.PNG

Now I merge my CD_Lymstaps table and event_data table for further transformation and analysis. 

 

My issue is this: I have hundreds of variables I will need to look at. I would like to define one macro variable at the beginning of my code, it could be anything. Lets pretend varname=lympost_ds_lym_dis_stat_ct. Now I need a program that will use varname to extract all values related to lympost_ds_lym_dis_stat_ct, and store it in a table (event_data).

Furthermore, I need code that uses varname to access my variable_definition table in Oracle and for my variable (lympost_ds_lym_dis_stat_ct), find the corresponding value under aggregation_order_group, in this case 'CD_Lymstaps' and save this variable in another macro, varname2. So now varname2=CD_Lymstaps.

Now for the last part, I need to create a table that has the name of whatever is stored in varname2, and this table contains all the values corresponding to the value stored in varname2, in this case (CD_Lymstaps) 

PGStats
Opal | Level 21

Here is a working template:

 

data event_data;
    input id name $; 
    datalines;
124 event_a
632 event_b
734 event_c
;

data variable_definition;
    input name $ agg_table $32.;
    datalines;
event_a agg_table_1
event_b agg_table_2
event_c agg_table_3
; 

data agg_table_1;
input id value;
datalines;
124 1
632 999
734 999
;

data agg_table_2;
input id value;
datalines;
124 999
632 2
734 999
;

data agg_table_3;
input id value;
datalines;
124 999
632 999
734 3
;

proc sql /*noprint*/;
select 
    cats("when """, name, catx(" ", """ then", agg_table), ".value"),
    catx(" ", "left join", agg_table, "on event_data.id =", cats(agg_table, ".id"))
into :cases separated by " ", :joins separated by " "
from variable_definition;
quit;

proc sql;
select 
    event_data.id,
    case event_data.name
    &cases. end as value
from
    event_data
    &joins.;
quit;
PG

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 11 replies
  • 3041 views
  • 0 likes
  • 4 in conversation