Hello,
I'm stuck on the following problem. I have a dataset that looks like this.
Type Value
animal sheep
animal dog
animal cow
office chair
office table
building skyscraper
building appartment
building flat
building townhouse
I need to concatenate the values per type to 1 single string, seperated by a comma and a space (, ). So then I end up with somethinig like
Type NewValue
animal sheep, dog, cow
office chair, table
building skyscraper, appartment, flat, townhouse
I can transpose this dataset and then concatenate the values together. But the problem is that the number of values per type can vary up to an unknown number. So on one moment in time there could be 4 maximum values per type. On another moment in time there could be 10 values per type.
I figured out I can use ATTRN(NSVARS) to calculate the number of variables/columns that the transpose creates but I can't figure out the syntax mnsto use this number to write it down to something like
catx(', ',&Column1 to &Column
Does anyone have a solution for this?
Many thanks in advance.,
How about using a DOW loop?
DATA HAVE;
INFILE DATALINES;
LENGTH TYPE $8 VALUE $10;
INPUT TYPE $ VALUE $;
DATALINES;
ANIMAL SHEEP
ANIMAL DOG
ANIMAL COW
OFFICE CHAIR
OFFICE TABLE
BUILDING SKYSCRAPER
BUILDING APPARTMENT
BUILDING FLAT
BUILDING TOWNHOUSE
;
RUN;
PROC SORT DATA=HAVE;
BY TYPE;
RUN;
DATA WANT;
LENGTH VALUE2 $2000;
DO UNTIL (LAST.TYPE);
SET HAVE;
BY TYPE;
VALUE2 = CATX(", ",VALUE2,VALUE);
END;
RUN;
Well, the simplest one which jumps to mind is (assuming sorted):
data want (keep=type newvalue);
length newvalue $2000;
set have;
by type;
if first.type then newvalue=value;
else newvalue=strip(newvalue)||","||strip(value);
if last.type then output;
run;
You could also use transpose, just create a template dataset with far more variables than ever will happen and then set that with the transpose (do note however that one variable can only hold up to 2000 characters, thereofr if you have lots of data it will not all fit and you will get truncation - also on previous):
data temp;
attrib val1-val2000 format=$200.;
if _n_<0 then output;
run;
proc transpose data=have out=want prefix=val;
by type;
run;
data want;
set want;
attrib newvalue format=$2000.;
newvalue=catx(',',of val1-val2000);
run;
This didn't work for me unfortunately.
Sorry, which post didn't work, what were the errors/warnings?
DATA HAVE;
INFILE DATALINES;
LENGTH TYPE $8 VALUE $10;
INPUT TYPE $ VALUE $;
DATALINES;
ANIMAL SHEEP
ANIMAL DOG
ANIMAL COW
OFFICE CHAIR
OFFICE TABLE
BUILDING SKYSCRAPER
BUILDING APPARTMENT
BUILDING FLAT
BUILDING TOWNHOUSE
;
RUN;
PROC SORT DATA=HAVE;
BY TYPE;
RUN;
data want2 (keep=type newvalue);
length newvalue $2000;
set have;
by type;
if first.type then newvalue=value;
else newvalue=strip(newvalue)||","||strip(value);
if last.type then output;
run;
There are no errors. But the result looks like this:
,COW | ANIMAL |
,TOWNHOUSE | BUILDING |
,TABLE | OFFICE |
And I couldn't really figure out how to make it work. So I went with Scott_Mitchell's solution 🙂
Yep, I forget the retain statement:
data want2 ;
length newvalue $2000;
set have;
retain newvalue;
by type;
if first.type then newvalue=value;
else newvalue=strip(newvalue)||","||strip(value);
if last.type then output;
run;
Ah, there. Now it works yes. I assumed it was a retain and was messing around with it but I put it in the wrong place in your code. Thanks for the help!
Use retaining and by processing in this well ordered dataset. Your questions to solve:
- maximum length of resulting variable
- output processing on last of the by group
- initialisation on first of the by group.
How about using a DOW loop?
DATA HAVE;
INFILE DATALINES;
LENGTH TYPE $8 VALUE $10;
INPUT TYPE $ VALUE $;
DATALINES;
ANIMAL SHEEP
ANIMAL DOG
ANIMAL COW
OFFICE CHAIR
OFFICE TABLE
BUILDING SKYSCRAPER
BUILDING APPARTMENT
BUILDING FLAT
BUILDING TOWNHOUSE
;
RUN;
PROC SORT DATA=HAVE;
BY TYPE;
RUN;
DATA WANT;
LENGTH VALUE2 $2000;
DO UNTIL (LAST.TYPE);
SET HAVE;
BY TYPE;
VALUE2 = CATX(", ",VALUE2,VALUE);
END;
RUN;
This seems to do the trick without needing the transpose but why? I'm not familliar with putting a do until statement in front of the set statement.
My proposal to programming a retain is more easy to understand. It is sharing the by processing but the retaining is solved different.
The whole trick is based on some definitions with the PDV (program data vector) and the automatic record processing / reinitializing values of recordfields.
http://support.sas.com/resources/papers/proceedings12/052-2012.pdf Scott Mitchels code can be copied and works great.
Thank you very much for the link. I will go look into it 🙂
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.