BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
taupirho
Calcite | Level 5

Is there a general purpose way of concatenating each variable in an observation into one larger variable whilst preserving the format of numeric/currency fields in terms of how it looks when you do a proc print on the dataset. (see sashelp.shoes for example)

 

Here is some code you can run, as you can see when looking at the log, using the catx function to produce a comma separated output removes both the $ currency sign as well as the period from the numeric variables

 

 

proc print data=sashelp.shoes (obs=10);

run;
proc sql;
 select name into :varstr2 separated by ','
 from dictionary.columns
 where libname = "SASHELP" and
 memname = "SHOES";

 quit;


data stuff();

  format all $5000.;

  set sashelp.shoes ;

  all = catx(',',&varstr2.) ;

  put all;
  run;

 

1 ACCEPTED SOLUTION

Accepted Solutions
Tom
Super User Tom
Super User

If you really wanted to do this you could use the CALL VNEXT() function and the VVALUEX() function to loop over all of the variables.

You could even add in some smarts to deal with missing values and adding quotes around values with special characters.

Let's make a sample little dataset to use.

data have;
  set sashelp.shoes(obs=5);
  if _n_=2 then call missing(region);
  if _n_=3 then call missing(stores);
  if _n_=4 then product='.';
run;

So we will need to make some new variables to drive the logic.  At least one to take the name of the variable. But more to add more flexibility.

data want;
  set have;
  length __name $32 __type __dlm $1 __value __all $5000 ;
  __dlm=',';
  do _n_=1 by 1 while (1);
    call vnext(__name,__type);
    if __name='__name' then leave;
    __value=vvaluex(__name);
    if strip(__value) = getoption('missing') and __type='N' then __value=' ';
    if indexc(__value,'"','0D0A'x,__dlm) then __value=quote(strip(__value));
    if _n_=1 then __all=__value;
    else __all=cats(__all,__dlm,__value);
  end;
  drop __name __type __value __dlm ;
run;

So the loop will use CALL VNEXT() to find the next variables name and then use VVALUEX() to get the formatted value.

It then checks to make sure that the formatted value for a numeric variable is just the character being used for missing values as set in the MISSING option.  Then it adds quotes around values that have a quote or the delimiter or carriage return or a linefeed.

Then when it is the first variable is just stores the value, but for the additional variables at adds in the delimiter.

Finally it drops the extra variables it created.

Obs    Region    Product         Subsidiary     Stores           Sales       Inventory         Returns

 1     Africa    Boot            Addis Ababa      12           $29,761        $191,821            $769
 2               Men's Casual    Addis Ababa       4           $67,242        $118,036          $2,284
 3     Africa    Men's Dress     Addis Ababa       .           $76,793        $136,273          $2,433
 4     Africa    .               Addis Ababa      10           $62,819        $204,284          $1,861
 5     Africa    Slipper         Addis Ababa      14           $68,641        $279,795          $1,771

Obs                                __all

 1     Africa,Boot,Addis Ababa,12,"$29,761","$191,821",$769
 2     ,Men's Casual,Addis Ababa,4,"$67,242","$118,036","$2,284"
 3     Africa,Men's Dress,Addis Ababa,,"$76,793","$136,273","$2,433"
 4     Africa,.,Addis Ababa,10,"$62,819","$204,284","$1,861"
 5     Africa,Slipper,Addis Ababa,14,"$68,641","$279,795","$1,771"

Here is a version converted to a macro that uses a temporary array to reduce the chance of conflicts between the names of variables in the input dataset and the variables needed for the processing.

%macro cat_all
(inds=     /* Input dataset name */
,outds=    /* Output dataset name */
,dlm=','   /* Delimiter (as quoted string) */
,var=all   /* New variable name */
,len=5000  /* Max length of new variable */
);

/* 
The DO loop will stop when at reaches the new variable.
The _N_ counter is used to be able to tell when processing the first
variable.

Uses a temporary array to reduce potential name conflicts.
So only variable names not allowed in the input dataset are
the value of &VAR and the name ___ used for the array.

Meaning of the Array entries
___[1] is NAME of the current variable
___[2] is TYPE of the current variable 
___[3] is VALUE of the current variable

Because the temporary array values are retained need to force 
___[1] to empty so that CALL VNEXT() starts at first variable name each 
iteration of the data step loop.

If the variable is numeric and its only value is the character used
by the MISSING option then force value to blanks.

When value contains quotes, CR, LF or the delimiter then add quotes 
around the value.

Will not work with space as the delimiter.

*/

data &outds;
  set &inds;
  length &var $&len ;
  array ___ [3] $&len _temporary_;
  ___[1]=' ';
  do _n_=1 by 1 while (1);
    call vnext(___[1],___[2]);
    if ___[1]="&var" then leave;
    ___[3]=vvaluex(___[1]);
    if strip(___[3]) = getoption('missing') and ___[2]='N' then ___[3]=' ';
    if indexc(___[3],'"','0D0A'x,&dlm) then ___[3]=quote(strip(___[3]));
    if _n_=1 then &var=___[3];
    else &var=cats(&var,&dlm,___[3]);
  end;
run;

%mend cat_all;

View solution in original post

10 REPLIES 10
tomrvincent
Rhodochrosite | Level 12
Calc name based on the data type and format of each field, converting numbers and dates into the char versions you want.
andreas_lds
Jade | Level 19

Why do you want all values in one variable?

The function vvalue returns the formatted value of a variable, and vvaluex allows the usage of expressions:

 

proc sql noprint;
   select Name format=$quote32.
      into :NameList separated by ','
      from sashelp.vcolumn
         where libname = 'SASHELP' and memname = 'SHOES'
      ;
quit;

%put &=NameList;

data narf;
   set sashelp.shoes;

   length everything anything $ 1000 _var $ 32;

   everything = catx(',',of _all_);

   do _var = &NameList.;
      anything = catx(',', anything, vvaluex(_var));
   end;

   keep everything anything;
run;

proc print;run;
taupirho
Calcite | Level 5

Is there a way to have a null field instead of a period for missing data?

 

So instead of A,B,.,D we would have A,B,,D

ballardw
Super User

See if this comes close to what you want. This uses a temporary text file that you should set to a location and name of your choice.

 

May want to run this code once with your data set a length for the LONGVAR that is appropriate.

filename myfile "x:\data\junk.csv";
Proc export data=sashelp.shoes
     file= myfile
     dbms=csv replace;
run;

data want;
  infile myfile dlm=',' lrecl=1000;
  input;
  longvar= _infile_;
run;
taupirho
Calcite | Level 5
I don't have access to the filesystem to save files to
Tom
Super User Tom
Super User

@taupirho wrote:
I don't have access to the filesystem to save files to

Just use the TEMP engine on the FILENAME statement to make a temporary file.

Tom
Super User Tom
Super User

The PUT statement already knows how to do this.

data _null_;
   set sashelp.shoes(obs=3);
   file 'myfile.csv' dsd ;
   put (_all_) (+0);
run;

What is the goal of this exercise? If you want to create a file why not just write the file directly?

data _null_;
   set sashelp.shoes(obs=3);
   file 'myfile.csv' dsd ;
   put (_all_) (+0);
run;

If you really do want to get that string into a dataset variable then you could just read the file back in.

filename junk temp;
data _null_;
   set sashelp.shoes(obs=3);
   file junk dsd ;
   put (_all_) (+0);
run;
data stuff ;
  set sashelp.shoes(obs=3);
  infile junk truncover ;
  input all $5000.;
run;

You can even do it without creating the full text file. Instead just write one line at a time and save the line into a variable using the _FILE_ automatic variable.

filename junk temp;
data stuff;
  set sashelp.shoes(obs=3);
  file junk dsd lrecl=5000 ;
  length all $5000;
  put @1 (_all_) (+0) +(-2) '  ' @;
  all = _file_;
  output;
  all=' ';
  put @1 all $5000. @;
run;

 

taupirho
Calcite | Level 5

I don't want to write a file!. When I concatenate the fields together, missing fields get represented by a period character. Can it be made so that nothing is output for a missing field (without writing files!). So instead of A,B.,.,.,.,.,Y I get A,B,,,,,Y

 

Also a supplementary question. Does the field separator have to be a printable character. Can it be something like a control character such as CTRL-A

Reeza
Super User

@taupirho wrote:

I don't want to write a file!.


How are we supposed to know that?

 


@taupirho wrote:

When I concatenate the fields together, missing fields get represented by a period character. Can it be made so that nothing is output for a missing field (without writing files!). So instead of A,B.,.,.,.,.,Y I get A,B,,,,,Y

 


This depends on how you use the CAT functions. If you use OF it doesn't do that, if you use the full list it does. 


@taupirho wrote:

Also a supplementary question. Does the field separator have to be a printable character. Can it be something like a control character such as CTRL-A


Did you try it? What happened? It doesn't have to be a printable character, but then it depends on how the output is displayed in your destination. Listing will likely show it it differently than HTML or PDF. So you may not see it either which makes it harder to debug. 

 



Some one else has a very similar question right now, so I'm assuming this is homework or something?

https://communities.sas.com/t5/New-SAS-User/How-to-write-a-function-to-convert-matrix-into-CSV-in-SA...

 

There is a request to have this functionality built into SAS but it currently doesn't exist:

https://communities.sas.com/t5/SASware-Ballot-Ideas/Concatenate-function-using-underlying-assigned-f...

 

You can roll your own with PROC FCMP though or see the example of a solution in the comments. 

Tom
Super User Tom
Super User

If you really wanted to do this you could use the CALL VNEXT() function and the VVALUEX() function to loop over all of the variables.

You could even add in some smarts to deal with missing values and adding quotes around values with special characters.

Let's make a sample little dataset to use.

data have;
  set sashelp.shoes(obs=5);
  if _n_=2 then call missing(region);
  if _n_=3 then call missing(stores);
  if _n_=4 then product='.';
run;

So we will need to make some new variables to drive the logic.  At least one to take the name of the variable. But more to add more flexibility.

data want;
  set have;
  length __name $32 __type __dlm $1 __value __all $5000 ;
  __dlm=',';
  do _n_=1 by 1 while (1);
    call vnext(__name,__type);
    if __name='__name' then leave;
    __value=vvaluex(__name);
    if strip(__value) = getoption('missing') and __type='N' then __value=' ';
    if indexc(__value,'"','0D0A'x,__dlm) then __value=quote(strip(__value));
    if _n_=1 then __all=__value;
    else __all=cats(__all,__dlm,__value);
  end;
  drop __name __type __value __dlm ;
run;

So the loop will use CALL VNEXT() to find the next variables name and then use VVALUEX() to get the formatted value.

It then checks to make sure that the formatted value for a numeric variable is just the character being used for missing values as set in the MISSING option.  Then it adds quotes around values that have a quote or the delimiter or carriage return or a linefeed.

Then when it is the first variable is just stores the value, but for the additional variables at adds in the delimiter.

Finally it drops the extra variables it created.

Obs    Region    Product         Subsidiary     Stores           Sales       Inventory         Returns

 1     Africa    Boot            Addis Ababa      12           $29,761        $191,821            $769
 2               Men's Casual    Addis Ababa       4           $67,242        $118,036          $2,284
 3     Africa    Men's Dress     Addis Ababa       .           $76,793        $136,273          $2,433
 4     Africa    .               Addis Ababa      10           $62,819        $204,284          $1,861
 5     Africa    Slipper         Addis Ababa      14           $68,641        $279,795          $1,771

Obs                                __all

 1     Africa,Boot,Addis Ababa,12,"$29,761","$191,821",$769
 2     ,Men's Casual,Addis Ababa,4,"$67,242","$118,036","$2,284"
 3     Africa,Men's Dress,Addis Ababa,,"$76,793","$136,273","$2,433"
 4     Africa,.,Addis Ababa,10,"$62,819","$204,284","$1,861"
 5     Africa,Slipper,Addis Ababa,14,"$68,641","$279,795","$1,771"

Here is a version converted to a macro that uses a temporary array to reduce the chance of conflicts between the names of variables in the input dataset and the variables needed for the processing.

%macro cat_all
(inds=     /* Input dataset name */
,outds=    /* Output dataset name */
,dlm=','   /* Delimiter (as quoted string) */
,var=all   /* New variable name */
,len=5000  /* Max length of new variable */
);

/* 
The DO loop will stop when at reaches the new variable.
The _N_ counter is used to be able to tell when processing the first
variable.

Uses a temporary array to reduce potential name conflicts.
So only variable names not allowed in the input dataset are
the value of &VAR and the name ___ used for the array.

Meaning of the Array entries
___[1] is NAME of the current variable
___[2] is TYPE of the current variable 
___[3] is VALUE of the current variable

Because the temporary array values are retained need to force 
___[1] to empty so that CALL VNEXT() starts at first variable name each 
iteration of the data step loop.

If the variable is numeric and its only value is the character used
by the MISSING option then force value to blanks.

When value contains quotes, CR, LF or the delimiter then add quotes 
around the value.

Will not work with space as the delimiter.

*/

data &outds;
  set &inds;
  length &var $&len ;
  array ___ [3] $&len _temporary_;
  ___[1]=' ';
  do _n_=1 by 1 while (1);
    call vnext(___[1],___[2]);
    if ___[1]="&var" then leave;
    ___[3]=vvaluex(___[1]);
    if strip(___[3]) = getoption('missing') and ___[2]='N' then ___[3]=' ';
    if indexc(___[3],'"','0D0A'x,&dlm) then ___[3]=quote(strip(___[3]));
    if _n_=1 then &var=___[3];
    else &var=cats(&var,&dlm,___[3]);
  end;
run;

%mend cat_all;

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 10 replies
  • 1183 views
  • 0 likes
  • 6 in conversation