Hi, I am very new to Snowflake. I am trying to create a SAS dataset from a snowflake table using SAS/ACCESS interface to Snowflake. I can successfully connect to Snowflake database and can create a SAS dataset without errors. Problem is that some of the columns are defined as arrays in Snowflake and after bringing data Snowflake databases into SAS environment, SAS reads array objects as a string where each element of original Snowflake array column is quoted and concatenated inside of a square bracket. I would like to read each element of the Snowflake array as a column in SAS. To add to the complexity a Snowflake column array may have different number of elements in each row.
This is what I have:
ID | Snowflake Array Column: HCPCS_CD | PROC PRINT showing SAS column as a string: HCPCS_CODE |
1 | J3490, J3590, AAAXT | ["J3490", "J3590", "AAAXT"] |
2 | A990, B210 | ["A990", "B210"] |
3 | C220, B210, J3490, J3590 | ["C220", "B210", "J3490", "J3590"] |
I would like SAS dataset to have variable HCPCS_CD into multiple character columns one for each element of the array:
ID |
HCPCS_CD1 | HCPCS_CD2 | HCPCS_CD3 | HCPCS_CD4 |
1 | J3490 | J3590 | AAAXT | |
2 | A990 | B210 | ||
3 | C220 | B210 | J3490 | J3590 |
If SAS can read a Snowflake array column directly as a SAS array that would work too.
Thanks and appreciate any help.
Yes. You need to write a piece of code to get job done.
data have;
input ID HCPCS_CD $50. ;
cards;
1 ["J3490", "J3590", "AAAXT"]
2 ["A990", "B210"]
3 ["C220", "B210", "J3490", "J3590"]
;
*Get the max number of columns in HCPCS_CD;
proc sql noprint;
select max(countw(HCPCS_CD,',"[] ','q')) into :max trimmed from have;
quit;
*Achieve it;
data want;
set have;
array x{&max.} $ 80 HCPCS_CD1-HCPCS_CD&max. ; *Each column has length 80;
do i=1 to countw(HCPCS_CD,',"[] ','q');
x{i}=dequote(scan(HCPCS_CD,i,',"[] ','q'));
end;
drop i;
run;
You will probably need to write your own code for this.
SAS dataset do not have arrays. The ARRAY statement is just something you use in a data step to allow you to reference an actual variable via an index into a list.
You will need to define the lengths of the variables you want to create and the number of them.
So something like this:
data sas.dataset;
set snow.table ;
array HCPCS_CD[4] $5 ;
do index=1 to min(dim(hcpcs_cd),countw(hcpcs_code,'[,]'));
hcpcs_cd[index] = dequote(scan(hcpcs_code,index,'[,]'));
end;
drop index;
run;
Hi Tom,
Thanks for your help. Your code worked but I marked Ksharp's code as solution because I uses programming to determine the number of columns needed as I do not have that information apriori at had. Much appreciate your prompt responses.
Here's the SAS doc page that references the Snowflake ARRAY data type. You could either use Snowflake functions to unpack the array - see the semi-structured data example in the link or you could use SAS functions and statements to do the same thing. It is up to you which method to use. If your arrays are bigger than 32K then unpacking using Snowflake functions is the only option.
Hi SASKiwi,
Thanks for the reference. It is very helpful. In all the examples on that page the name of array elements in the snowflake table is known, [Name, Sex, etc.]. In my case I am not sure how to get that information from existing snowflake table.
Kind regards,
Yes. You need to write a piece of code to get job done.
data have;
input ID HCPCS_CD $50. ;
cards;
1 ["J3490", "J3590", "AAAXT"]
2 ["A990", "B210"]
3 ["C220", "B210", "J3490", "J3590"]
;
*Get the max number of columns in HCPCS_CD;
proc sql noprint;
select max(countw(HCPCS_CD,',"[] ','q')) into :max trimmed from have;
quit;
*Achieve it;
data want;
set have;
array x{&max.} $ 80 HCPCS_CD1-HCPCS_CD&max. ; *Each column has length 80;
do i=1 to countw(HCPCS_CD,',"[] ','q');
x{i}=dequote(scan(HCPCS_CD,i,',"[] ','q'));
end;
drop i;
run;
Hi Ksharp,
Thanks your code works. Tom's code also worked but I am marking yours as the solution because it finds the array with maximum number of elements and computes the number of columns needed. This info is not available to me apriori. Along the same line of thinking is it also possible to determine the length of the longest "word" in the array so I can efficiently set the length of the variables?
Kind regards,
“Along the same line of thinking is it also possible to determine the length of the longest "word" in the array so I can efficiently set the length of the variables?”
I understand your question. But that is not so easy to get it.
Here is how to get the length of longest word:
data have; input ID HCPCS_CD $50. ; cards; 1 ["J3490", "J3590", "AAAXT"] 2 ["A990", "B210"] 3 ["C220", "B210", "J3490", "J3590"] ; *Get the max number of columns in HCPCS_CD; proc sql noprint; select max(countw(HCPCS_CD,',"[] ','q')) into :max trimmed from have; quit; *Get the length of the longest word; data _null_; set have end=last; retain length .; do i=1 to countw(HCPCS_CD,',"[] ','q'); length=max(length,length(dequote(scan(HCPCS_CD,i,',"[] ','q')))); end; if last then call symputx('length',length); run; %put &=length.; data want; set have; array x{*} $ &length. HCPCS_CD1-HCPCS_CD&max. ; do i=1 to countw(HCPCS_CD,',"[] ','q'); x{i}=dequote(scan(HCPCS_CD,i,',"[] ','q')); end; drop i; run;
You don't need to two passes through the data to calculate the maximum number of words and the maximum word length.
And note that double quotes should not be considered a delimiter.
*Get the max number of words and length of the longest word;
data _null_;
set have end=last;
retain length 1 max 1;
n=countw(HCPCS_CD,',[] ','q');
max=max(n,max);
do i=1 to n;
length=max(length,lengthn(dequote(scan(HCPCS_CD,i,',[] ','q'))));
end;
if last then do;
call symputx('length',length);
call symputx('max',max);
end;
run;
%put &=max &=length;
Thanks @Ksharp,
I figured it would involve more datasteps but this is exactly what I was looking for. Much appreciated.
Kind regards,
Hi @arorata ,I do not know what is Snowflake, but the complete code to produce a SAS table per your request base on the data you have is as follows:
data have;
input id hcpcs_cd $32.;
datalines;
1 J3490, J3590, AAAXT
2 A990, B210
3 C220, B210, J3490, J3590
;
run;
proc print data=have;run;
proc sql noprint;
select max(countw(hcpcs_cd,','))
into :num
from have;
quit;
data want;
set have;
array newcol[&num]$ hcpcs_cd1-hcpcs_cd4;
do i=1 to #
newcol[i]=scan(hcpcs_cd,i,',');
end;
drop hcpcs_cd i;
run;
proc print data=want noobs;run;
Hi @arorata , I update the code in my previous thread. Just for your reference. This code can be used when your raw data has more variations (I created two more rows in the raw data to illustrate this ) in the hcpcs_cd column. The code and output are as follows.
/*prepare raw data*/
data have;
input id hcpcs_cd /*$60.*/ $56.;
col_length=length(hcpcs_cd);
retain maxcollngh;
maxcollngh=max(maxcollngh,col_length);
datalines;
1 J3490, J3590, AAAXT
2 A990, B210
3 C220, B210, J3490, J3590
4 A567890AB, B210009CDE, C200, , , CDE2300, AABBCC00112210
5 AABBCCDDEE1234567890
;
run;
proc print data=have noobs;
var id hcpcs_cd;
run;
data _null_;
set have end=last;
if last then
call symputx('maxcollngh',maxcollngh);
run;
%put &maxcollngh;/*56*/
/*determine how many new columns
the new dataset has*/
proc sql noprint;
select max(countw(hcpcs_cd,','))
into :num trimmed
from have;
quit;
%put #/*7*/
/*determine the maximum length
of the new columns*/
data have1;
set have;
array len[&num] newcol_len1-newcol_len#
do i=1 to #
len[i]=length(scan(hcpcs_cd,i,','));
end;
maxlen_row=max(of newcol_len1-newcol_len&num);
retain maxlen_newcol;
maxlen_newcol=max(maxlen_newcol,maxlen_row);
run;
data _null_;
set have1 end=last;
if last then
call symputx('maxlen_newcol',maxlen_newcol);
run;
%put &maxlen_newcol;/*20*/
/*create new dataset*/
data want;
set have;
array newcol[&num]$&maxlen_newcol hcpcs_cd1-hcpcs_cd#
do i=1 to #
newcol[i]=scan(hcpcs_cd,i,',');
end;
keep id hcpcs_cd1-hcpcs_cd7;
run;
proc print data=want noobs;run;
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.