BookmarkSubscribeRSS Feed
☑ This topic is solved. Need further help from the community? Please sign in and ask a new question.
arorata
Obsidian | Level 7

Hi, I am very new to Snowflake. I am trying to create a SAS dataset from a snowflake table using SAS/ACCESS interface to Snowflake. I can successfully connect to Snowflake database and can create a SAS dataset without errors. Problem is that some of the columns are defined as arrays in Snowflake and after bringing data Snowflake databases into SAS environment, SAS reads array objects as a string where each element of original Snowflake array column is quoted and concatenated inside of a square bracket. I would like to read each element of the Snowflake array as a column in SAS. To add to the complexity a Snowflake column array may have different number of elements in each row.

This is what I have:

ID Snowflake Array Column: HCPCS_CD PROC PRINT showing SAS column as a string: HCPCS_CODE
1 J3490, J3590, AAAXT ["J3490", "J3590", "AAAXT"]
2 A990, B210 ["A990", "B210"]
3 C220, B210, J3490, J3590 ["C220", "B210", "J3490", "J3590"]

 

I would like SAS dataset to have variable HCPCS_CD into multiple character columns one for each element of the array:

ID
HCPCS_CD1 HCPCS_CD2 HCPCS_CD3 HCPCS_CD4
1 J3490 J3590 AAAXT  
2 A990 B210    
3 C220 B210 J3490 J3590

 

If SAS can read a Snowflake array column directly as a SAS array that would work too.

Thanks and appreciate any help.

1 ACCEPTED SOLUTION

Accepted Solutions
Ksharp
Super User

Yes. You need to write a piece of code to get job done.


data have;
input ID HCPCS_CD $50. ;
cards;
1    ["J3490", "J3590", "AAAXT"]
2	 ["A990", "B210"]
3    ["C220", "B210", "J3490", "J3590"]
;



*Get the max number of columns in HCPCS_CD;
proc sql noprint;
select max(countw(HCPCS_CD,',"[] ','q')) into :max  trimmed from have;
quit;

*Achieve it;
data want;
 set have;
 array x{&max.} $ 80 HCPCS_CD1-HCPCS_CD&max. ; *Each column has length 80;
 do i=1 to countw(HCPCS_CD,',"[] ','q');
   x{i}=dequote(scan(HCPCS_CD,i,',"[] ','q'));
 end;
 drop i;
run;

 

Ksharp_0-1758785169866.png

 

View solution in original post

17 REPLIES 17
Tom
Super User Tom
Super User

You will probably need to write your own code for this.

SAS dataset do not have arrays.  The ARRAY statement is just something you use in a data step to allow you to reference an actual variable via an index into a list.

 

You will need to define the lengths of the variables you want to create and the number of them.

So something like this:

data sas.dataset;
  set snow.table ;
  array HCPCS_CD[4] $5 ;
  do index=1 to min(dim(hcpcs_cd),countw(hcpcs_code,'[,]'));
     hcpcs_cd[index] = dequote(scan(hcpcs_code,index,'[,]'));
  end;
  drop index;
run;
arorata
Obsidian | Level 7

Hi Tom,

Thanks for your help. Your code worked but I marked Ksharp's code as solution because I uses programming to determine the number of columns needed as I do not have that information apriori at had. Much appreciate your prompt responses.

SASKiwi
PROC Star

Here's the SAS doc page that references the Snowflake ARRAY data type. You could either use Snowflake functions to unpack the array - see the semi-structured data example in the link or you could use SAS functions and statements to do the same thing. It is up to you which method to use. If your arrays are bigger than 32K then unpacking using Snowflake functions is the only option.

arorata
Obsidian | Level 7

Hi SASKiwi,

Thanks for the reference. It is very helpful. In all the examples on that page the name of array elements in the snowflake table is known, [Name, Sex, etc.]. In my case I am not sure how to get that information from existing snowflake table.

Kind regards,

Ksharp
Super User

Yes. You need to write a piece of code to get job done.


data have;
input ID HCPCS_CD $50. ;
cards;
1    ["J3490", "J3590", "AAAXT"]
2	 ["A990", "B210"]
3    ["C220", "B210", "J3490", "J3590"]
;



*Get the max number of columns in HCPCS_CD;
proc sql noprint;
select max(countw(HCPCS_CD,',"[] ','q')) into :max  trimmed from have;
quit;

*Achieve it;
data want;
 set have;
 array x{&max.} $ 80 HCPCS_CD1-HCPCS_CD&max. ; *Each column has length 80;
 do i=1 to countw(HCPCS_CD,',"[] ','q');
   x{i}=dequote(scan(HCPCS_CD,i,',"[] ','q'));
 end;
 drop i;
run;

 

Ksharp_0-1758785169866.png

 

arorata
Obsidian | Level 7

Hi Ksharp,

Thanks your code works. Tom's code also worked but I am marking yours as the solution because it finds the array with maximum number of elements and computes the number of columns needed. This info is not available to me apriori. Along the same line of thinking is it also possible to determine the length of the longest "word" in the array so I can efficiently set the length of the variables?

Kind regards,

Ksharp
Super User

Along the same line of thinking is it also possible to determine the length of the longest "word" in the array so I can efficiently set the length of the variables?”

I understand your question. But that is not so easy to get it.

Here is how to get the length of longest word:

data have;
input ID HCPCS_CD $50. ;
cards;
1    ["J3490", "J3590", "AAAXT"]
2	 ["A990", "B210"]
3    ["C220", "B210", "J3490", "J3590"]
;



*Get the max number of columns in HCPCS_CD;
proc sql noprint;
select max(countw(HCPCS_CD,',"[] ','q')) into :max  trimmed from have;
quit;

*Get the  length of the longest word;
data _null_;
 set have end=last;
 retain length .;
 do i=1 to countw(HCPCS_CD,',"[] ','q');
   length=max(length,length(dequote(scan(HCPCS_CD,i,',"[] ','q'))));
 end;
 if last then call symputx('length',length);
run;
%put &=length.;

data want;
 set have;
 array x{*} $ &length. HCPCS_CD1-HCPCS_CD&max. ; 
 do i=1 to countw(HCPCS_CD,',"[] ','q');
   x{i}=dequote(scan(HCPCS_CD,i,',"[] ','q'));
 end;
 drop i;
run;

 

 

Tom
Super User Tom
Super User

You don't need to two passes through the data to calculate the maximum number of words and the maximum word length.

 

And note that double quotes should not be considered a delimiter.

*Get the  max number of words and length of the longest word;
data _null_;
  set have end=last;
  retain length 1 max 1;
  n=countw(HCPCS_CD,',[] ','q');
  max=max(n,max);
  do i=1 to n;
    length=max(length,lengthn(dequote(scan(HCPCS_CD,i,',[] ','q'))));
  end;
  if last then do;
    call symputx('length',length);
    call symputx('max',max);
  end;
run;
%put &=max &=length;
arorata
Obsidian | Level 7

Hi @Tom,

Thanks for this elegant solution. 

Kind regards,

Ksharp
Super User
"And note that double quotes should not be considered a delimiter."
Tom, that is the same thing. 'q' would ignore the double quotes too. You can test it if you don't believe.

And I agree with you combine these two step into one step to save PC source.
Tom
Super User Tom
Super User

The purpose of the Q modifier is to treat delimiters inside of quotes as normal characters.  But if the double quote character is one of the delimiters then stings quoted with double quotes would look like separate words.  Meaning you would need to use single quotes to mask delimiter characters.

 

But you might want to actually test it as SAS's implementation may not use the same order of operations ( between checking for quoted strings versus checking for delimiters) as I would have done.

 

Spoiler
26   data test;
27     input string $40.;
28     without = scan(string,2,'[ ,]','q');
29     with = scan(string,2,'[" ,]','q');
30     put (w:) (=);
31   cards;

without="2 x" with="2 x"

So it will treat the matched quotes as indicating one word.

But the result also shows that adding double quote as a delimiter is not needed.

Ksharp
Super User
Tom,
Agree with you.
I think this situation is barely appeared .
Normally , double quote would not be in a word. Or double quote would not be a part of a word.
arorata
Obsidian | Level 7

Thanks @Ksharp,

I figured it would involve more datasteps but this is exactly what I was looking for. Much appreciated.

Kind regards,

dxiao2017
Lapis Lazuli | Level 10
Hi arorata, I think all of the steps can be combined into just one data step, i.e., it is possible using one data step to solve the whole question, am I right, what do you think?

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 17 replies
  • 1709 views
  • 14 likes
  • 5 in conversation