<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Shorten long string, but ensure it remains unique in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194010#M36475</link>
    <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello, &lt;/P&gt;&lt;P&gt;this one is a bit tricky to explain. Here is the original problem though:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a list of 10,000 observations. Each observation has a unique observation ID which is between 100 and 400 characters long:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;TABLE border="1" class="jiveBorder" height="74" style="border: 1px solid rgb(0, 0, 0); width: 389px; height: 58px;"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TH style="text-align: center; background-color: #6690bc; color: #ffffff; padding: 2px;" valign="middle"&gt;&lt;STRONG&gt;OBSERVATION_ID&lt;/STRONG&gt;&lt;/TH&gt;&lt;TH style="text-align: center; background-color: #6690bc; color: #ffffff; padding: 2px;" valign="middle"&gt;&lt;STRONG&gt;VALUE&lt;/STRONG&gt;&lt;/TH&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD style="padding: 2px;"&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;xxxxxxx&lt;/SPAN&gt;CHARACTERS100&lt;/TD&gt;&lt;TD style="padding: 2px;"&gt;100&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD style="padding: 2px;"&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;xxxxxxxxxx&lt;/SPAN&gt;CHARACTERS300&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD style="padding: 2px;"&gt;50&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD style="padding: 2px;"&gt;&lt;P&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;xxxxxxxxxxx&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;CHARACTERS400&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD style="padding: 2px;"&gt;25&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD style="padding: 2px;"&gt;&lt;EM&gt;(more observation IDs...)&lt;/EM&gt;&lt;/TD&gt;&lt;TD style="padding: 2px;"&gt;&lt;EM&gt;...&lt;/EM&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This data set now gets transposed and the observation IDs become the column names. The problem is that the observation_id's get truncated to 32 characters, because that is the maximum &lt;SPAN style="font-size: 13.3333330154419px; line-height: 1.5em;"&gt;length&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.3333330154419px; line-height: 1.5em;"&gt; for&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt; column names supported by SAS 9.4:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;TABLE border="1" class="jiveBorder" height="74" style="border: 1px solid rgb(0, 0, 0); width: 535px; height: 40px;"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TH style="text-align: center; background-color: #6690bc; color: #ffffff; padding: 2px;" valign="middle"&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;xxxxxxx&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;CHARACTER100&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/TH&gt;&lt;TH style="text-align: center; background-color: #6690bc; color: #ffffff; padding: 2px;" valign="middle"&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;xxxxxxxxxx&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;CHARACTER3&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/TH&gt;&lt;TH style="text-align: center; background-color: #6690bc; color: #ffffff; padding: 2px;" valign="middle"&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;xxxxxxxxxxx&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;CHARACTER&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/TH&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD style="padding: 2px;"&gt;100&lt;/TD&gt;&lt;TD style="padding: 2px;"&gt;50&lt;/TD&gt;&lt;TD style="padding: 2px;"&gt;25&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;One problem is that they may no longer be unique when truncated. They also can no longer be referenced by people querying the data. For instance, a person wants to be able to run a program that divides Observation 1 by Observation 2 and reference those two observations by their observation ID. This is no longer possible, because they would still enter the whole/original observation ID.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;Possible Solution:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I thought about creating an md5 hash for each observation ID. It shortens them to 32 bytes. But is that a safe thing to do considering the source string is much longer than 32 bytes (up to 400 characters long)? Even if it is safe because of its very low collision probability, I wouldn't be able to work with those special characters MD5 generates. So I would have convert the hashed string using a hexadecimal format for instance. But wouldn't that tremendously increase the collision probability due to its low character range (I thinik 1-16 characters only)?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for your help!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Phil&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
    <pubDate>Thu, 06 Aug 2015 17:15:23 GMT</pubDate>
    <dc:creator>PhilfromGermany</dc:creator>
    <dc:date>2015-08-06T17:15:23Z</dc:date>
    <item>
      <title>Shorten long string, but ensure it remains unique</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194010#M36475</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Hello, &lt;/P&gt;&lt;P&gt;this one is a bit tricky to explain. Here is the original problem though:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a list of 10,000 observations. Each observation has a unique observation ID which is between 100 and 400 characters long:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;TABLE border="1" class="jiveBorder" height="74" style="border: 1px solid rgb(0, 0, 0); width: 389px; height: 58px;"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TH style="text-align: center; background-color: #6690bc; color: #ffffff; padding: 2px;" valign="middle"&gt;&lt;STRONG&gt;OBSERVATION_ID&lt;/STRONG&gt;&lt;/TH&gt;&lt;TH style="text-align: center; background-color: #6690bc; color: #ffffff; padding: 2px;" valign="middle"&gt;&lt;STRONG&gt;VALUE&lt;/STRONG&gt;&lt;/TH&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD style="padding: 2px;"&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;xxxxxxx&lt;/SPAN&gt;CHARACTERS100&lt;/TD&gt;&lt;TD style="padding: 2px;"&gt;100&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD style="padding: 2px;"&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;xxxxxxxxxx&lt;/SPAN&gt;CHARACTERS300&lt;/SPAN&gt;&lt;/TD&gt;&lt;TD style="padding: 2px;"&gt;50&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD style="padding: 2px;"&gt;&lt;P&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;xxxxxxxxxxx&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;CHARACTERS400&lt;/SPAN&gt;&lt;/P&gt;&lt;/TD&gt;&lt;TD style="padding: 2px;"&gt;25&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD style="padding: 2px;"&gt;&lt;EM&gt;(more observation IDs...)&lt;/EM&gt;&lt;/TD&gt;&lt;TD style="padding: 2px;"&gt;&lt;EM&gt;...&lt;/EM&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;This data set now gets transposed and the observation IDs become the column names. The problem is that the observation_id's get truncated to 32 characters, because that is the maximum &lt;SPAN style="font-size: 13.3333330154419px; line-height: 1.5em;"&gt;length&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.3333330154419px; line-height: 1.5em;"&gt; for&lt;/SPAN&gt;&lt;SPAN style="font-size: 10pt; line-height: 1.5em;"&gt; column names supported by SAS 9.4:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;TABLE border="1" class="jiveBorder" height="74" style="border: 1px solid rgb(0, 0, 0); width: 535px; height: 40px;"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TH style="text-align: center; background-color: #6690bc; color: #ffffff; padding: 2px;" valign="middle"&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;xxxxxxx&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;CHARACTER100&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/TH&gt;&lt;TH style="text-align: center; background-color: #6690bc; color: #ffffff; padding: 2px;" valign="middle"&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;xxxxxxxxxx&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;CHARACTER3&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/TH&gt;&lt;TH style="text-align: center; background-color: #6690bc; color: #ffffff; padding: 2px;" valign="middle"&gt;&lt;P&gt;&lt;STRONG&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;xxxxxxxxxxx&lt;/SPAN&gt;&lt;SPAN style="font-size: 13.3333330154419px;"&gt;CHARACTER&lt;/SPAN&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;/TH&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD style="padding: 2px;"&gt;100&lt;/TD&gt;&lt;TD style="padding: 2px;"&gt;50&lt;/TD&gt;&lt;TD style="padding: 2px;"&gt;25&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;One problem is that they may no longer be unique when truncated. They also can no longer be referenced by people querying the data. For instance, a person wants to be able to run a program that divides Observation 1 by Observation 2 and reference those two observations by their observation ID. This is no longer possible, because they would still enter the whole/original observation ID.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN style="text-decoration: underline;"&gt;Possible Solution:&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I thought about creating an md5 hash for each observation ID. It shortens them to 32 bytes. But is that a safe thing to do considering the source string is much longer than 32 bytes (up to 400 characters long)? Even if it is safe because of its very low collision probability, I wouldn't be able to work with those special characters MD5 generates. So I would have convert the hashed string using a hexadecimal format for instance. But wouldn't that tremendously increase the collision probability due to its low character range (I thinik 1-16 characters only)?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for your help!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Phil&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 06 Aug 2015 17:15:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194010#M36475</guid>
      <dc:creator>PhilfromGermany</dc:creator>
      <dc:date>2015-08-06T17:15:23Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten long string, but ensure it remains unique</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194011#M36476</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Why do you need to convert this dataset from a long to a wide set?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 06 Aug 2015 17:20:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194011#M36476</guid>
      <dc:creator>dsbihill</dc:creator>
      <dc:date>2015-08-06T17:20:33Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten long string, but ensure it remains unique</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194012#M36477</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Because the end users want to create simple files that hold formulae. For example: Observation 1 + Observation 2. I must load these formulae and execute them. This only works if the underlying observations are all in one row. Otherwise I have to use procedures and complex logic. Let's assume the design is a given.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 06 Aug 2015 17:24:00 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194012#M36477</guid>
      <dc:creator>PhilfromGermany</dc:creator>
      <dc:date>2015-08-06T17:24:00Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten long string, but ensure it remains unique</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194013#M36478</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Just number them.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data middle ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; set have ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; by observation_id ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; if first.observation_id then observation_number+1;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;P&gt;proc transpose data=middle out=want prefix=obs ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; id obervation_number ;&lt;/P&gt;&lt;P&gt;&amp;nbsp; var value ;&lt;/P&gt;&lt;P&gt;run;&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 06 Aug 2015 17:44:52 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194013#M36478</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2015-08-06T17:44:52Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten long string, but ensure it remains unique</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194014#M36479</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;And assign the long text to the LABEL.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 06 Aug 2015 18:17:25 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194014#M36479</guid>
      <dc:creator>ballardw</dc:creator>
      <dc:date>2015-08-06T18:17:25Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten long string, but ensure it remains unique</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194015#M36480</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;Tom, how is the end user going to be able to reference the observation id if I converted it to a number? He only has the actual observation id (the name), not the number which I would be assigning dynamically according to your code. Ballardw, can I reference the label like I would reference the column name when aggregating variables? I think the maximum label length is 256 characters, right? That'd still be too short.&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 06 Aug 2015 19:55:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194015#M36480</guid>
      <dc:creator>PhilfromGermany</dc:creator>
      <dc:date>2015-08-06T19:55:57Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten long string, but ensure it remains unique</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194016#M36481</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;How were they going to reference the long strings?&lt;/P&gt;&lt;P&gt;Why exactly are you transposing the data in the first place?&lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 06 Aug 2015 20:00:48 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194016#M36481</guid>
      <dc:creator>Tom</dc:creator>
      <dc:date>2015-08-06T20:00:48Z</dc:date>
    </item>
    <item>
      <title>Re: Shorten long string, but ensure it remains unique</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194017#M36482</link>
      <description>&lt;HTML&gt;&lt;HEAD&gt;&lt;/HEAD&gt;&lt;BODY&gt;&lt;P&gt;You're expecting a customer to reference a 256+ long string in formula's and not make mistakes?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Build/supply a lookup table to them. &lt;/P&gt;&lt;/BODY&gt;&lt;/HTML&gt;</description>
      <pubDate>Thu, 06 Aug 2015 20:53:10 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Shorten-long-string-but-ensure-it-remains-unique/m-p/194017#M36482</guid>
      <dc:creator>Reeza</dc:creator>
      <dc:date>2015-08-06T20:53:10Z</dc:date>
    </item>
  </channel>
</rss>

