BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
togglefroggle
Fluorite | Level 6

Hello!

 

I am trying to learn loops using SQL and I'm trying to create a loop that joins a codeword column and adds it each time. Its kind of hard for me to put into words so here is my data and what i am trying to make it look like:

EXAMPLE OF DATA:

 

DATA1 TABLE

V1 V2 V3 

4    2    1

6    6    5

2    1    6

 

CODEWORD TABLE:

CHAR     CODE

1           PURPLE

2           BIRD

3           BRICK

4           LEFT

5           HIPPO

6           SIX

 

The number of columns for the data1 table is set by a macro variable NUMCOL, in the data example I set NUMCOL=3.

what I want it to do is create a table like this:

NEW DATA1 TABLE

V1 V2 V3  V1_CODE  V2_CODE  V3_CODE 

4    2    1      LEFT        BIRD          PURPLE

6    6    5      SIX           SIX             HIPPO

2    1    6      BIRD        PURPLE     SIX

 

Here is what i am trying to make the loop do with PROC SQL

 

START OF LOOP

 

PROC SQL 

      CREATE TABLE DATA1 AS

      SELECT DISTICT T1.V1

      SELECT DISTICT T1.V2

      SELECT DISTICT T1.V3

      SELECT DISTICT T2.CODE AS V1_CODE

FROM DATA1 T1 LEFT JOIN CODEWORDS V2 ON (T1.V1 = T2.CHAR)

QUIT;

 

ITERATION 2: 

 

      CREATE TABLE DATA1 AS

      SELECT DISTICT T1.V1

      SELECT DISTICT T1.V2

      SELECT DISTICT T1.V3

      SELECT T1.V1_CODE

      SELECT DISTICT T2.CODE AS V2_CODE

FROM DATA1 T1 LEFT JOIN CODEWORDS V2 ON (T1.V2 = T2.CHAR)

QUIT;

 

ITERATION 3

 

      CREATE TABLE DATA1 AS

      SELECT DISTICT T1.V1,

      SELECT DISTICT T1.V2,

      SELECT DISTICT T1.V3,

      SELECT T1.V1_CODE ,

      SELECT T1.V2_CODE,

      SELECT DISTICT T3.CODE AS V3_CODE

      FROM DATA1 T1

      LEFT JOIN CODEWORDS V2 ON (T1.V2 = T2.CHAR)

QUIT;

 

basically i am trying to add a codeword to each variable, where one table is filled with variables that match the CHAR column on the codeword table, and loop it to how many dimensions I have, so if my dimension X is 3 it loops 3 times, and 4 for 4 times etc. Sorry if my explanation is convoluted, I've been trying to figure this out for a few hours now and I don't know how to get it to successfully loop like I want.

 

Any help would be appreciated, or if you have some ideas on how to do it differently that would also be great!  Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
ballardw
Super User

Since SQL does not have a native "loop" construct you might reconsider rephrasing what to call this.

 

If the purpose is manipulating variables that's one thing.

If I were forced to do something like this in sql my approach:

proc format;
value $codeword
'1'='PURPLE'
'2'='BIRD'
'3'='BRICK'
'4'='LEFT'
'5'='HIPPO'
'6'='SIX'
;
data have;
   input v1 $ v2 $ v3 $;
datalines;
4    2    1
6    6    5
2    1    6
;

proc sql;
   create table want as
   select v1,v2,v3
         ,put(v1,$codeword.) as v1_code 
         ,put(v2,$codeword.) as v2_code 
         ,put(v3,$codeword.) as v3_code 
   from have
   ;
quit;

Though a data step would be more flexible because you can use variable lists with arrays and accomplish the same thing.

data want2;
   set have;
   array vv(*) v1-v3;
   array codeword_V(3) $ 10.;
   do i=1 to dim(vv);
      codeword_V[i] = put(vv[i],$codeword.);
   end;
   drop i;
run;

Demonstrating one reason to end variable names in sequence numbers instead of burying them in the middle.

If you had 50 similar values to apply the format two the second datastep would require exactly two change of replacing "3" with "50". No matter how you approach it in SQL you will have to type out 50 variable names with the assignments.

 

And for most actual uses likely would use the format directly:

Proc print data=have;
   format v1-v3 $codeword.;
run;

So single value lookups are likely not the best exercise for this.

 

View solution in original post

4 REPLIES 4
Kurt_Bremser
Super User

Much too complicated. Create a format from your lookup table, and use that to assign the values for the new variables. This is best done in a data step, using arrays and a DO loop.

ballardw
Super User

Since SQL does not have a native "loop" construct you might reconsider rephrasing what to call this.

 

If the purpose is manipulating variables that's one thing.

If I were forced to do something like this in sql my approach:

proc format;
value $codeword
'1'='PURPLE'
'2'='BIRD'
'3'='BRICK'
'4'='LEFT'
'5'='HIPPO'
'6'='SIX'
;
data have;
   input v1 $ v2 $ v3 $;
datalines;
4    2    1
6    6    5
2    1    6
;

proc sql;
   create table want as
   select v1,v2,v3
         ,put(v1,$codeword.) as v1_code 
         ,put(v2,$codeword.) as v2_code 
         ,put(v3,$codeword.) as v3_code 
   from have
   ;
quit;

Though a data step would be more flexible because you can use variable lists with arrays and accomplish the same thing.

data want2;
   set have;
   array vv(*) v1-v3;
   array codeword_V(3) $ 10.;
   do i=1 to dim(vv);
      codeword_V[i] = put(vv[i],$codeword.);
   end;
   drop i;
run;

Demonstrating one reason to end variable names in sequence numbers instead of burying them in the middle.

If you had 50 similar values to apply the format two the second datastep would require exactly two change of replacing "3" with "50". No matter how you approach it in SQL you will have to type out 50 variable names with the assignments.

 

And for most actual uses likely would use the format directly:

Proc print data=have;
   format v1-v3 $codeword.;
run;

So single value lookups are likely not the best exercise for this.

 

togglefroggle
Fluorite | Level 6
Hey! Thank you so much for the help, I honestly really appreciate it. I can see that the way I was thinking was very Pidgeon holed so I appreciate you taking the time to explain it in an easily understandable way. In the future I will try and find the simplest solution i can instead of relying on overly complicated processes

I do have one last quick question though if you do not mind, I see when you called proc format you set labels to the variable names, is there a way to do this automatically without writing each out for a large table? My codeword table is 2 columns with like 1000 different words in one column and random numbers assigned to each, is there a way to set it so that I can use the number and assign it a codeword label easily like in your proc format statements?
ballardw
Super User

@togglefroggle wrote:
Hey! Thank you so much for the help, I honestly really appreciate it. I can see that the way I was thinking was very Pidgeon holed so I appreciate you taking the time to explain it in an easily understandable way. In the future I will try and find the simplest solution i can instead of relying on overly complicated processes

I do have one last quick question though if you do not mind, I see when you called proc format you set labels to the variable names, is there a way to do this automatically without writing each out for a large table? My codeword table is 2 columns with like 1000 different words in one column and random numbers assigned to each, is there a way to set it so that I can use the number and assign it a codeword label easily like in your proc format statements?

If you have a data set with the value pairs then with a little modification you can use it as a Cntlin optional data set with Proc Format to create the format. The Cntlin data set does have a specific format and required variable names.

The minimum variables you need to provide if it is a one to one look up like your example are:

FMTNAME:  a character variable that holds the name of the format, no special characters, letters, digits and underscore only, and cannot end a number (would conflict with the w. element of format use)

Start:   the value you have

Label:  the text you want to display when the format is used

Type  a single character of either C or N for character or numeric values.

 

An example using the previous values. I am creating a data set with two variables to hold the values needed. If your data set has more variables you may want to drop them when making the control set just to prevent accidentally using one of the 22 variable names that have special meaning in this context to proc format.

 

data list;
  input value $ codeword $;
datalines;
 1   PURPLE 
 2   BIRD 
 3   BRICK 
 4   LEFT 
 5   HIPPO 
 6   SIX 
;

data codewordcontrol;
   set list;
   fmtname = "Codeword";
   type = "C";
   rename value=Start
          codeword=label
   ;
run;

proc format cntlin=codewordcontrol;
run;

If your variables are actually numeric, not clear from your example, use Type='N' to create a format that works with numbers and don't use the $ in front of the format name when used. When your values are numeric and want to do something with a range of values such as ages of 0 to 5 you would add an End value for the end of the range and could use a label like "5 and under". There are additional variables that control whether the end points are included or excluded for the comparisons before applying the label.

 

You can see the other variables by adding CNTLOUT=somedataset to create the data as Proc Format uses it. The result of a cntlout data set can be used as a cntlin.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

Mastering the WHERE Clause in PROC SQL

SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.

Find more tutorials on the SAS Users YouTube channel.

Discussion stats
  • 4 replies
  • 3588 views
  • 2 likes
  • 3 in conversation