BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
arorata
Obsidian | Level 7

Hi, I am very new to Snowflake. I am trying to create a SAS dataset from a snowflake table using SAS/ACCESS interface to Snowflake. I can successfully connect to Snowflake database and can create a SAS dataset without errors. Problem is that some of the columns are defined as arrays in Snowflake and after bringing data Snowflake databases into SAS environment, SAS reads array objects as a string where each element of original Snowflake array column is quoted and concatenated inside of a square bracket. I would like to read each element of the Snowflake array as a column in SAS. To add to the complexity a Snowflake column array may have different number of elements in each row.

This is what I have:

ID Snowflake Array Column: HCPCS_CD PROC PRINT showing SAS column as a string: HCPCS_CODE
1 J3490, J3590, AAAXT ["J3490", "J3590", "AAAXT"]
2 A990, B210 ["A990", "B210"]
3 C220, B210, J3490, J3590 ["C220", "B210", "J3490", "J3590"]

 

I would like SAS dataset to have variable HCPCS_CD into multiple character columns one for each element of the array:

ID
HCPCS_CD1 HCPCS_CD2 HCPCS_CD3 HCPCS_CD4
1 J3490 J3590 AAAXT  
2 A990 B210    
3 C220 B210 J3490 J3590

 

If SAS can read a Snowflake array column directly as a SAS array that would work too.

Thanks and appreciate any help.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Yes. You need to write a piece of code to get job done.


data have;
input ID HCPCS_CD $50. ;
cards;
1    ["J3490", "J3590", "AAAXT"]
2	 ["A990", "B210"]
3    ["C220", "B210", "J3490", "J3590"]
;



*Get the max number of columns in HCPCS_CD;
proc sql noprint;
select max(countw(HCPCS_CD,',"[] ','q')) into :max  trimmed from have;
quit;

*Achieve it;
data want;
 set have;
 array x{&max.} $ 80 HCPCS_CD1-HCPCS_CD&max. ; *Each column has length 80;
 do i=1 to countw(HCPCS_CD,',"[] ','q');
   x{i}=dequote(scan(HCPCS_CD,i,',"[] ','q'));
 end;
 drop i;
run;

 

Ksharp_0-1758785169866.png

 

View solution in original post

13 REPLIES 13
Tom
Super User Tom
Super User

You will probably need to write your own code for this.

SAS dataset do not have arrays.  The ARRAY statement is just something you use in a data step to allow you to reference an actual variable via an index into a list.

 

You will need to define the lengths of the variables you want to create and the number of them.

So something like this:

data sas.dataset;
  set snow.table ;
  array HCPCS_CD[4] $5 ;
  do index=1 to min(dim(hcpcs_cd),countw(hcpcs_code,'[,]'));
     hcpcs_cd[index] = dequote(scan(hcpcs_code,index,'[,]'));
  end;
  drop index;
run;
arorata
Obsidian | Level 7

Hi Tom,

Thanks for your help. Your code worked but I marked Ksharp's code as solution because I uses programming to determine the number of columns needed as I do not have that information apriori at had. Much appreciate your prompt responses.

SASKiwi
PROC Star

Here's the SAS doc page that references the Snowflake ARRAY data type. You could either use Snowflake functions to unpack the array - see the semi-structured data example in the link or you could use SAS functions and statements to do the same thing. It is up to you which method to use. If your arrays are bigger than 32K then unpacking using Snowflake functions is the only option.

arorata
Obsidian | Level 7

Hi SASKiwi,

Thanks for the reference. It is very helpful. In all the examples on that page the name of array elements in the snowflake table is known, [Name, Sex, etc.]. In my case I am not sure how to get that information from existing snowflake table.

Kind regards,

Ksharp
Super User

Yes. You need to write a piece of code to get job done.


data have;
input ID HCPCS_CD $50. ;
cards;
1    ["J3490", "J3590", "AAAXT"]
2	 ["A990", "B210"]
3    ["C220", "B210", "J3490", "J3590"]
;



*Get the max number of columns in HCPCS_CD;
proc sql noprint;
select max(countw(HCPCS_CD,',"[] ','q')) into :max  trimmed from have;
quit;

*Achieve it;
data want;
 set have;
 array x{&max.} $ 80 HCPCS_CD1-HCPCS_CD&max. ; *Each column has length 80;
 do i=1 to countw(HCPCS_CD,',"[] ','q');
   x{i}=dequote(scan(HCPCS_CD,i,',"[] ','q'));
 end;
 drop i;
run;

 

Ksharp_0-1758785169866.png

 

arorata
Obsidian | Level 7

Hi Ksharp,

Thanks your code works. Tom's code also worked but I am marking yours as the solution because it finds the array with maximum number of elements and computes the number of columns needed. This info is not available to me apriori. Along the same line of thinking is it also possible to determine the length of the longest "word" in the array so I can efficiently set the length of the variables?

Kind regards,

Ksharp
Super User

Along the same line of thinking is it also possible to determine the length of the longest "word" in the array so I can efficiently set the length of the variables?”

I understand your question. But that is not so easy to get it.

Here is how to get the length of longest word:

data have;
input ID HCPCS_CD $50. ;
cards;
1    ["J3490", "J3590", "AAAXT"]
2	 ["A990", "B210"]
3    ["C220", "B210", "J3490", "J3590"]
;



*Get the max number of columns in HCPCS_CD;
proc sql noprint;
select max(countw(HCPCS_CD,',"[] ','q')) into :max  trimmed from have;
quit;

*Get the  length of the longest word;
data _null_;
 set have end=last;
 retain length .;
 do i=1 to countw(HCPCS_CD,',"[] ','q');
   length=max(length,length(dequote(scan(HCPCS_CD,i,',"[] ','q'))));
 end;
 if last then call symputx('length',length);
run;
%put &=length.;

data want;
 set have;
 array x{*} $ &length. HCPCS_CD1-HCPCS_CD&max. ; 
 do i=1 to countw(HCPCS_CD,',"[] ','q');
   x{i}=dequote(scan(HCPCS_CD,i,',"[] ','q'));
 end;
 drop i;
run;

 

 

Tom
Super User Tom
Super User

You don't need to two passes through the data to calculate the maximum number of words and the maximum word length.

 

And note that double quotes should not be considered a delimiter.

*Get the  max number of words and length of the longest word;
data _null_;
  set have end=last;
  retain length 1 max 1;
  n=countw(HCPCS_CD,',[] ','q');
  max=max(n,max);
  do i=1 to n;
    length=max(length,lengthn(dequote(scan(HCPCS_CD,i,',[] ','q'))));
  end;
  if last then do;
    call symputx('length',length);
    call symputx('max',max);
  end;
run;
%put &=max &=length;
arorata
Obsidian | Level 7

Hi @Tom,

Thanks for this elegant solution. 

Kind regards,

Ksharp
Super User
"And note that double quotes should not be considered a delimiter."
Tom, that is the same thing. 'q' would ignore the double quotes too. You can test it if you don't believe.

And I agree with you combine these two step into one step to save PC source.
arorata
Obsidian | Level 7

Thanks @Ksharp,

I figured it would involve more datasteps but this is exactly what I was looking for. Much appreciated.

Kind regards,

dxiao2017
Lapis Lazuli | Level 10

Hi @arorata ,I do not know what is Snowflake, but the complete code to produce a SAS table per your request base on the data you have is as follows:

data have;
   input id hcpcs_cd $32.;
   datalines;
1 J3490, J3590, AAAXT
2 A990, B210
3 C220, B210, J3490, J3590
;
run;
proc print data=have;run;
proc sql noprint;
select max(countw(hcpcs_cd,','))
   into :num
   from have;
quit;
data want;
   set have;
   array newcol[&num]$ hcpcs_cd1-hcpcs_cd4;
   do i=1 to #
      newcol[i]=scan(hcpcs_cd,i,',');
   end;
   drop hcpcs_cd i;
run;
proc print data=want noobs;run;

dxiao2017_0-1758815917871.png

 

dxiao2017
Lapis Lazuli | Level 10

Hi @arorata , I update the code in my previous thread. Just for your reference. This code can be used when your raw data has more variations (I created two more rows in the raw data to illustrate this ) in the hcpcs_cd column. The code and output are as follows.

/*prepare raw data*/
data have;
   input id hcpcs_cd /*$60.*/ $56.;
   col_length=length(hcpcs_cd);
   retain maxcollngh;
   maxcollngh=max(maxcollngh,col_length);
   datalines;
1 J3490, J3590, AAAXT
2 A990, B210
3 C220, B210, J3490, J3590
4 A567890AB, B210009CDE, C200, , , CDE2300, AABBCC00112210
5 AABBCCDDEE1234567890
;
run;
proc print data=have noobs;
   var id hcpcs_cd;
run;
data _null_;
   set have end=last;
   if last then 
      call symputx('maxcollngh',maxcollngh);
run;
%put &maxcollngh;/*56*/
/*determine how many new columns
the new dataset has*/
proc sql noprint;
select max(countw(hcpcs_cd,','))
   into :num trimmed
   from have;
quit;
%put #/*7*/
/*determine the maximum length
of the new columns*/
data have1;
   set have;
   array len[&num] newcol_len1-newcol_len#
   do i=1 to #
      len[i]=length(scan(hcpcs_cd,i,','));
   end;
   maxlen_row=max(of newcol_len1-newcol_len&num);
   retain maxlen_newcol;
   maxlen_newcol=max(maxlen_newcol,maxlen_row);
run;
data _null_;
   set have1 end=last;
   if last then 
      call symputx('maxlen_newcol',maxlen_newcol);
run;
%put &maxlen_newcol;/*20*/
/*create new dataset*/
data want;
   set have;
   array newcol[&num]$&maxlen_newcol hcpcs_cd1-hcpcs_cd#
   do i=1 to #
      newcol[i]=scan(hcpcs_cd,i,',');
   end;
   keep id hcpcs_cd1-hcpcs_cd7;
run;
proc print data=want noobs;run;

dxiao2017_0-1758996177888.png

 

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 13 replies
  • 674 views
  • 11 likes
  • 5 in conversation