Hello!
I am trying to learn loops using SQL and I'm trying to create a loop that joins a codeword column and adds it each time. Its kind of hard for me to put into words so here is my data and what i am trying to make it look like:
EXAMPLE OF DATA:
DATA1 TABLE
V1 V2 V3
4 2 1
6 6 5
2 1 6
CODEWORD TABLE:
CHAR CODE
1 PURPLE
2 BIRD
3 BRICK
4 LEFT
5 HIPPO
6 SIX
The number of columns for the data1 table is set by a macro variable NUMCOL, in the data example I set NUMCOL=3.
what I want it to do is create a table like this:
NEW DATA1 TABLE
V1 V2 V3 V1_CODE V2_CODE V3_CODE
4 2 1 LEFT BIRD PURPLE
6 6 5 SIX SIX HIPPO
2 1 6 BIRD PURPLE SIX
Here is what i am trying to make the loop do with PROC SQL
START OF LOOP
PROC SQL
CREATE TABLE DATA1 AS
SELECT DISTICT T1.V1
SELECT DISTICT T1.V2
SELECT DISTICT T1.V3
SELECT DISTICT T2.CODE AS V1_CODE
FROM DATA1 T1 LEFT JOIN CODEWORDS V2 ON (T1.V1 = T2.CHAR)
QUIT;
ITERATION 2:
CREATE TABLE DATA1 AS
SELECT DISTICT T1.V1
SELECT DISTICT T1.V2
SELECT DISTICT T1.V3
SELECT T1.V1_CODE
SELECT DISTICT T2.CODE AS V2_CODE
FROM DATA1 T1 LEFT JOIN CODEWORDS V2 ON (T1.V2 = T2.CHAR)
QUIT;
ITERATION 3
CREATE TABLE DATA1 AS
SELECT DISTICT T1.V1,
SELECT DISTICT T1.V2,
SELECT DISTICT T1.V3,
SELECT T1.V1_CODE ,
SELECT T1.V2_CODE,
SELECT DISTICT T3.CODE AS V3_CODE
FROM DATA1 T1
LEFT JOIN CODEWORDS V2 ON (T1.V2 = T2.CHAR)
QUIT;
basically i am trying to add a codeword to each variable, where one table is filled with variables that match the CHAR column on the codeword table, and loop it to how many dimensions I have, so if my dimension X is 3 it loops 3 times, and 4 for 4 times etc. Sorry if my explanation is convoluted, I've been trying to figure this out for a few hours now and I don't know how to get it to successfully loop like I want.
Any help would be appreciated, or if you have some ideas on how to do it differently that would also be great! Thanks!
Since SQL does not have a native "loop" construct you might reconsider rephrasing what to call this.
If the purpose is manipulating variables that's one thing.
If I were forced to do something like this in sql my approach:
proc format; value $codeword '1'='PURPLE' '2'='BIRD' '3'='BRICK' '4'='LEFT' '5'='HIPPO' '6'='SIX' ; data have; input v1 $ v2 $ v3 $; datalines; 4 2 1 6 6 5 2 1 6 ; proc sql; create table want as select v1,v2,v3 ,put(v1,$codeword.) as v1_code ,put(v2,$codeword.) as v2_code ,put(v3,$codeword.) as v3_code from have ; quit;
Though a data step would be more flexible because you can use variable lists with arrays and accomplish the same thing.
data want2; set have; array vv(*) v1-v3; array codeword_V(3) $ 10.; do i=1 to dim(vv); codeword_V[i] = put(vv[i],$codeword.); end; drop i; run;
Demonstrating one reason to end variable names in sequence numbers instead of burying them in the middle.
If you had 50 similar values to apply the format two the second datastep would require exactly two change of replacing "3" with "50". No matter how you approach it in SQL you will have to type out 50 variable names with the assignments.
And for most actual uses likely would use the format directly:
Proc print data=have; format v1-v3 $codeword.; run;
So single value lookups are likely not the best exercise for this.
Much too complicated. Create a format from your lookup table, and use that to assign the values for the new variables. This is best done in a data step, using arrays and a DO loop.
Since SQL does not have a native "loop" construct you might reconsider rephrasing what to call this.
If the purpose is manipulating variables that's one thing.
If I were forced to do something like this in sql my approach:
proc format; value $codeword '1'='PURPLE' '2'='BIRD' '3'='BRICK' '4'='LEFT' '5'='HIPPO' '6'='SIX' ; data have; input v1 $ v2 $ v3 $; datalines; 4 2 1 6 6 5 2 1 6 ; proc sql; create table want as select v1,v2,v3 ,put(v1,$codeword.) as v1_code ,put(v2,$codeword.) as v2_code ,put(v3,$codeword.) as v3_code from have ; quit;
Though a data step would be more flexible because you can use variable lists with arrays and accomplish the same thing.
data want2; set have; array vv(*) v1-v3; array codeword_V(3) $ 10.; do i=1 to dim(vv); codeword_V[i] = put(vv[i],$codeword.); end; drop i; run;
Demonstrating one reason to end variable names in sequence numbers instead of burying them in the middle.
If you had 50 similar values to apply the format two the second datastep would require exactly two change of replacing "3" with "50". No matter how you approach it in SQL you will have to type out 50 variable names with the assignments.
And for most actual uses likely would use the format directly:
Proc print data=have; format v1-v3 $codeword.; run;
So single value lookups are likely not the best exercise for this.
@togglefroggle wrote:
Hey! Thank you so much for the help, I honestly really appreciate it. I can see that the way I was thinking was very Pidgeon holed so I appreciate you taking the time to explain it in an easily understandable way. In the future I will try and find the simplest solution i can instead of relying on overly complicated processes
I do have one last quick question though if you do not mind, I see when you called proc format you set labels to the variable names, is there a way to do this automatically without writing each out for a large table? My codeword table is 2 columns with like 1000 different words in one column and random numbers assigned to each, is there a way to set it so that I can use the number and assign it a codeword label easily like in your proc format statements?
If you have a data set with the value pairs then with a little modification you can use it as a Cntlin optional data set with Proc Format to create the format. The Cntlin data set does have a specific format and required variable names.
The minimum variables you need to provide if it is a one to one look up like your example are:
FMTNAME: a character variable that holds the name of the format, no special characters, letters, digits and underscore only, and cannot end a number (would conflict with the w. element of format use)
Start: the value you have
Label: the text you want to display when the format is used
Type a single character of either C or N for character or numeric values.
An example using the previous values. I am creating a data set with two variables to hold the values needed. If your data set has more variables you may want to drop them when making the control set just to prevent accidentally using one of the 22 variable names that have special meaning in this context to proc format.
data list; input value $ codeword $; datalines; 1 PURPLE 2 BIRD 3 BRICK 4 LEFT 5 HIPPO 6 SIX ; data codewordcontrol; set list; fmtname = "Codeword"; type = "C"; rename value=Start codeword=label ; run; proc format cntlin=codewordcontrol; run;
If your variables are actually numeric, not clear from your example, use Type='N' to create a format that works with numbers and don't use the $ in front of the format name when used. When your values are numeric and want to do something with a range of values such as ages of 0 to 5 you would add an End value for the end of the range and could use a label like "5 and under". There are additional variables that control whether the end points are included or excluded for the comparisons before applying the label.
You can see the other variables by adding CNTLOUT=somedataset to create the data as Proc Format uses it. The result of a cntlout data set can be used as a cntlin.
Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.