<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Combining Data Sets Containing Character Variables of Different Lengths in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243188#M268436</link>
    <description>This is how I got the data, I can't control it, the problem comes from free text fields, sometimes over 100 characters. I have 10 datasets with hundreds variables in each, I can't control it. I used PROC Import and told SAS to use maximum variables to define format and length. I see no other way to read the data into SAS, so now I have to "clean" it.</description>
    <pubDate>Wed, 13 Jan 2016 14:06:39 GMT</pubDate>
    <dc:creator>BlueNose</dc:creator>
    <dc:date>2016-01-13T14:06:39Z</dc:date>
    <item>
      <title>Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243165#M268429</link>
      <description>&lt;P&gt;Hello all,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have 10 SAS datasets, which I want to combine together (to concatenate). Some variables appear in all files, some only in some. Each file represent a different visit in a clinical trial.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I had a problem with variables being numeric in one dataset and character in another. There weren't too many of them, so I dealt with it manually.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now a new problem. Some character variables (usually the ones of the free text), have different length in differnt datasets, so I get lot's of warnings in the log, and my data is truncated.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Afte looking online, I tried the following thing, before using the SET statement, I used the length, like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data All;
length Var1 $ 160 Var2 $ 150;
Set DS1 DS2 DS3;
run;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So now there are no warnings anymore in the log, however, first of all, the data is still being truncated ! Secondly, when looking at the attributes of the specific variables, I see that the length was changed indeed, but the format wasn't, for example: If Var1 had length 109 in DS1 and 160 in DS2, than I do see 160 in the length, but in format I see $109, which is weird.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So my quesion is, assuming I have several dataset, with hundreds of variables in each, and assuming that the type is finally matching, how do I concatenate them all, if some character variables have different lengths ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thank you !&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jan 2016 13:14:42 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243165#M268429</guid>
      <dc:creator>BlueNose</dc:creator>
      <dc:date>2016-01-13T13:14:42Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243167#M268430</link>
      <description>Your doing it correctly, the format is making it appear as if the data is truncated. &lt;BR /&gt;&lt;BR /&gt;Change the format, similar to how you specified the length  &lt;BR /&gt;&lt;BR /&gt;Another trick is to append the data and making sure the base data is the dataset with the longest length/format, which sets the attributes for the final data set.</description>
      <pubDate>Wed, 13 Jan 2016 13:19:11 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243167#M268430</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2016-01-13T13:19:11Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243176#M268431</link>
      <description>Thank you. I thought about using the longest length/format. However, I have 10 datasets, and each variable can have the longest length in a different dataset, it's too many combinations to check.&lt;BR /&gt;How do I change the format, let's say, for a length of 142?</description>
      <pubDate>Wed, 13 Jan 2016 13:47:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243176#M268431</guid>
      <dc:creator>BlueNose</dc:creator>
      <dc:date>2016-01-13T13:47:49Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243181#M268432</link>
      <description>&lt;P&gt;Worth testing:&amp;nbsp; I believe if you use SQL to combine the data sets it will automatically solve the length issues.&amp;nbsp; Not sure about the formats, but it's definitely worth trying.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There isn't much reason to apply a format to a lengthy character variable.&amp;nbsp; The best solution for conflicting formats in those cases is probably just to remove the format entirely:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;format longvar;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Good luck.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jan 2016 13:55:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243181#M268432</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2016-01-13T13:55:08Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243183#M268433</link>
      <description>&lt;P&gt;Astounding, thank you. Your comment is interesting. What does format longvar does ? Will it work if my files contains both numerical variables AND characteristic variables with varying lenghts ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Secondly, how do I append a few datasets (let's say 3 or 4 for the example) using SQL ?&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jan 2016 13:58:50 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243183#M268433</guid>
      <dc:creator>BlueNose</dc:creator>
      <dc:date>2016-01-13T13:58:50Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243185#M268434</link>
      <description>&lt;P&gt;You can do it in a number of ways, the SQL posted below fixes the length in the first select and all other will then apply to that. &amp;nbsp;You could also query the SASHELP.VCOLUMN to find the biggest if needed. &amp;nbsp;The question is why do you have 10 datasets with different attributes? &amp;nbsp;Is this something you have created, or do you get it from someone else? &amp;nbsp;If its the former then fix it in your input datasets, if its the later check out the data import agreement to see what the data should be like.&lt;/P&gt;
&lt;P&gt;Although after checking, this does exactly the same thing:&lt;/P&gt;
&lt;PRE&gt;data want;
  length a $50 b c 8;
  set have1 have2 have3;
run;&lt;/PRE&gt;
&lt;P&gt;So I am now unsure of the question?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;data have1;
  length a $5;
  a="ABC"; b=2; output;
  a="DEFGT"; b=1; output;
run;

data have2;
  length a $25;
  a="23"; b=1; output;
run;

data have3;
  c=1; b=1; output;
run;

proc sql;
  create table WANT as
  select  A length=25,
          B,
          . as C
  from    HAVE1
  union all
  select  *,
          . as C
  from    HAVE2
  union all 
  select  "" as A,
          *
  from    HAVE3;
quit;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jan 2016 14:06:31 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243185#M268434</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2016-01-13T14:06:31Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243187#M268435</link>
      <description>&lt;P&gt;The FORMAT statement is intended as a DATA step statement.&amp;nbsp; When you don't actually supply a format name in a FORMAT statement, it's asking that the format be removed.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;format var1 var2 var3;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This statement asks that any formats for VAR1, VAR2, and VAR3 be removed.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Others are better positioned than myself to show you how to concatenate data sets in SQL.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Good luck.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jan 2016 14:06:16 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243187#M268435</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2016-01-13T14:06:16Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243188#M268436</link>
      <description>This is how I got the data, I can't control it, the problem comes from free text fields, sometimes over 100 characters. I have 10 datasets with hundreds variables in each, I can't control it. I used PROC Import and told SAS to use maximum variables to define format and length. I see no other way to read the data into SAS, so now I have to "clean" it.</description>
      <pubDate>Wed, 13 Jan 2016 14:06:39 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243188#M268436</guid>
      <dc:creator>BlueNose</dc:creator>
      <dc:date>2016-01-13T14:06:39Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243190#M268437</link>
      <description>&lt;P&gt;Ah, so its not actually dataset we are talking about then, I am presuming Excel sheets yes? &amp;nbsp;Common problem. &amp;nbsp;Get the data in CSV, i.e. plain text comma separated variable, if you have to, do it yourself - Excel -&amp;gt; SaveAs and select CommaSeparated. &amp;nbsp;Now for each of the files (or you may be able to get away with one) write a basic data step import:&lt;/P&gt;
&lt;PRE&gt;data mydata;
  infile "c:\abc.csv" dlm=",";
  length a $10 b $5...;
  input a $ b;
run;&lt;/PRE&gt;
&lt;P&gt;What this does is read in the data correctly, proc import is a "guessing" procedure, it is trying to guess what you want to do which is never a good thing, and you end up with this length nonsense. &amp;nbsp;Specify the lengths yourself, if you don't have an import agreement then fix lengths as 200.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jan 2016 14:10:49 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243190#M268437</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2016-01-13T14:10:49Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243191#M268438</link>
      <description>&lt;P&gt;Unfortunatelly something is wrong, I tried adding the format statement with the variable name and not specifying a format, and again, the length is now 200, and the format is&amp;nbsp;$109. The data is truncated.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jan 2016 14:11:27 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243191#M268438</guid>
      <dc:creator>BlueNose</dc:creator>
      <dc:date>2016-01-13T14:11:27Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243193#M268439</link>
      <description>&lt;P&gt;You are right, Excel sheets, and I did convert to CSV. I cannot use your way, because I have hundreds of variables in each file, so I will need to define them all in the length and input statements. Or am I wrong, won't I &amp;nbsp;have to do that?&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jan 2016 14:14:03 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243193#M268439</guid>
      <dc:creator>BlueNose</dc:creator>
      <dc:date>2016-01-13T14:14:03Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243196#M268440</link>
      <description>&lt;P&gt;While you haven't shown the program/log, I suspect the issue is the positioning of the statement.&amp;nbsp; Even though the LENGTH statement must come before the SET statement, the FORMAT statement must come after the SET statement.&amp;nbsp; LENGTH uses the first setting found, but FORMAT uses the last setting found.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jan 2016 14:17:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243196#M268440</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2016-01-13T14:17:43Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243198#M268441</link>
      <description>&lt;P&gt;That depends on the files themselves, i.e. what the variables are. &amp;nbsp;Is there a fixed number, are they all the same type - things that should be in a data import agreement. &amp;nbsp;&lt;/P&gt;
&lt;P&gt;The problem is if you don't look at it at the import stage, your going to have to look at it somewhere - i.e. your having trouble now with lengths, what about datatypes later on, processing on the data, ...&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jan 2016 14:20:08 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243198#M268441</guid>
      <dc:creator>RW9</dc:creator>
      <dc:date>2016-01-13T14:20:08Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243199#M268442</link>
      <description>You are right ! I moved the format to after the set, and now the format is $200, and my data is no longer truncated !&lt;BR /&gt;&lt;BR /&gt;The informat is still 109, does it matter ?</description>
      <pubDate>Wed, 13 Jan 2016 14:22:19 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243199#M268442</guid>
      <dc:creator>BlueNose</dc:creator>
      <dc:date>2016-01-13T14:22:19Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243202#M268443</link>
      <description>&lt;P&gt;It's unlikely to hurt, as you are unlikely to add observations at this point.&amp;nbsp; Still, it would be safer to remove the&amp;nbsp; informats.&amp;nbsp; If you have a FORMAT statement that removes formats, it's should be easy to construct an INFORMAT statement as well.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jan 2016 14:26:43 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243202#M268443</guid>
      <dc:creator>Astounding</dc:creator>
      <dc:date>2016-01-13T14:26:43Z</dc:date>
    </item>
    <item>
      <title>Re: Combining Data Sets Containing Character Variables of Different Lengths</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243212#M268444</link>
      <description>&lt;P&gt;Fully agree with &lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/4954"&gt;@Astounding﻿&lt;/a&gt;. I would remove all the permanently associated $&lt;EM&gt;w&lt;/EM&gt;. formats and informats from character variables (&lt;FONT face="courier new,courier"&gt;&lt;FONT face="arial,helvetica,sans-serif"&gt;in a data step or with PROC DATASETS&lt;/FONT&gt;&lt;/FONT&gt;). I have never found these formats helpful in datasets (let alone informats). Please note that $&lt;EM&gt;w&lt;/EM&gt;. (with &lt;EM&gt;w&lt;/EM&gt;=length of the character variable) is the default format for character variables anyway. So, you gain nothing, but incur the risk of truncation and unnecessary "differing attributes"&amp;nbsp;results in PROC COMPARE, if you assign this format permanently.&lt;/P&gt;</description>
      <pubDate>Wed, 13 Jan 2016 14:41:12 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Combining-Data-Sets-Containing-Character-Variables-of-Different/m-p/243212#M268444</guid>
      <dc:creator>FreelanceReinh</dc:creator>
      <dc:date>2016-01-13T14:41:12Z</dc:date>
    </item>
  </channel>
</rss>

