<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Standardize length of keys when merging multiple datasets in SAS Programming</title>
    <link>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645280#M192873</link>
    <description>This is what happened. I used PROC IMPORT to import excel variables. Because I couldn't figure out the other way around.</description>
    <pubDate>Tue, 05 May 2020 12:55:57 GMT</pubDate>
    <dc:creator>Cruise</dc:creator>
    <dc:date>2020-05-05T12:55:57Z</dc:date>
    <item>
      <title>Standardize length of keys when merging multiple datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645146#M192806</link>
      <description>&lt;P&gt;Hi Folks:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I'm merging more than twenty survey datasets together with different sources. Is there any way to standardize the length of ID1NAME &amp;amp; IDNAME across all datasets efficiently? Specifying the length of variables for each datasets one by one would be tedious.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Any hints are appreciated.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data alldata_id; /*259*/ 
merge m.reference_columns
Apt_rent_id Car_per_pp_id 
Hh_type_id N_houses_id N_manufact_id 
N_service_comp_id N_students_per_class_id  
N_welfare_facility_id Ratio_health_welfare_id 
Urbanicity_id Workers_per_pp_id Disability_id
Gdp_id Insured_pop_id N_employees_id 
Retail_workers_id Vac_rate_id;
by id1name idname;
run;&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;But i got the error below:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;WARNING: Multiple lengths were specified for the BY variable ID1NAME by input data sets. This
         might cause unexpected results.
WARNING: Multiple lengths were specified for the BY variable IDNAME by input data sets. This
         might cause unexpected results.
NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 250 observations read from the data set M.REFERENCE_COLUMNS.
NOTE: There were 172 observations read from the data set WORK.APT_RENT_ID.
NOTE: There were 245 observations read from the data set WORK.CAR_PER_PP_ID.
NOTE: There were 245 observations read from the data set WORK.HH_TYPE_ID.
NOTE: There were 320 observations read from the data set WORK.N_HOUSES_ID.
NOTE: There were 245 observations read from the data set WORK.N_MANUFACT_ID.
NOTE: There were 245 observations read from the data set WORK.N_SERVICE_COMP_ID.
NOTE: There were 245 observations read from the data set WORK.N_STUDENTS_PER_CLASS_ID.
NOTE: There were 245 observations read from the data set WORK.N_WELFARE_FACILITY_ID.
NOTE: There were 245 observations read from the data set WORK.RATIO_HEALTH_WELFARE_ID.
NOTE: There were 244 observations read from the data set WORK.URBANICITY_ID.
NOTE: There were 245 observations read from the data set WORK.WORKERS_PER_PP_ID.
NOTE: There were 249 observations read from the data set WORK.DISABILITY_ID.
NOTE: There were 228 observations read from the data set WORK.GDP_ID.
NOTE: There were 250 observations read from the data set WORK.INSURED_POP_ID.
NOTE: There were 228 observations read from the data set WORK.N_EMPLOYEES_ID.
NOTE: There were 228 observations read from the data set WORK.RETAIL_WORKERS_ID.
NOTE: There were 264 observations read from the data set WORK.VAC_RATE_ID.
NOTE: The data set WORK.ALLDATA_ID has 383 observations and 39 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.03 seconds
&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 05 May 2020 02:42:53 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645146#M192806</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-05-05T02:42:53Z</dc:date>
    </item>
    <item>
      <title>Re: Standardize length of keys when merging multiple datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645151#M192811</link>
      <description>&lt;P&gt;&lt;EM&gt;&amp;gt; Is there any way to standardize the length of ID1NAME &amp;amp; IDNAME ... would be&amp;nbsp;&lt;/EM&gt;&lt;I&gt;tedious&lt;/I&gt;&lt;/P&gt;
&lt;P&gt;Short answer: No there's no non-tedious way.&lt;/P&gt;
&lt;P&gt;You have a data quality problem and the source data should be painstakingly standardised by hand.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;There is a lazy way out though: Add something like&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;length id1name idname $20;&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;at the top of data step.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This does not replace vetting and cleaning the source data.&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2020 03:13:06 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645151#M192811</guid>
      <dc:creator>ChrisNZ</dc:creator>
      <dc:date>2020-05-05T03:13:06Z</dc:date>
    </item>
    <item>
      <title>Re: Standardize length of keys when merging multiple datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645157#M192816</link>
      <description>&lt;P&gt;&lt;a href="https://communities.sas.com/t5/user/viewprofilepage/user-id/132289"&gt;@Cruise&lt;/a&gt;&amp;nbsp; - You should also take note of this:&lt;/P&gt;
&lt;BLOCKQUOTE&gt;
&lt;PRE class="language-sas"&gt;&lt;CODE&gt;NOTE: MERGE statement has more than one data set with repeats of BY values.&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;That means more than one of your input datasets has duplicate BY variable values. The MERGE statement cannot handle this, so the duplicate rows on your second and subsequent datasets will be dropped. Is this what you want?&lt;/P&gt;
&lt;/BLOCKQUOTE&gt;</description>
      <pubDate>Tue, 05 May 2020 04:20:09 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645157#M192816</guid>
      <dc:creator>SASKiwi</dc:creator>
      <dc:date>2020-05-05T04:20:09Z</dc:date>
    </item>
    <item>
      <title>Re: Standardize length of keys when merging multiple datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645172#M192831</link>
      <description>&lt;P&gt;&lt;STRONG&gt;THE&lt;/STRONG&gt; way for treating this is to fix your import process(es). Variable attributes must be standardized along their documented values when the data is read into SAS. Do &lt;U&gt;NOT&lt;/U&gt; use PROC IMPORT and/or Excel files.&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2020 06:10:33 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645172#M192831</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-05-05T06:10:33Z</dc:date>
    </item>
    <item>
      <title>Re: Standardize length of keys when merging multiple datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645278#M192871</link>
      <description>Thank you so much for the pointer. This means I have to investigate the source of duplicates. Is there a way to know which particular datasets had duplicates that needs my investigation in the data merge process? How did you know that there were duplicates? These are invaluable info to me.</description>
      <pubDate>Tue, 05 May 2020 12:54:13 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645278#M192871</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-05-05T12:54:13Z</dc:date>
    </item>
    <item>
      <title>Re: Standardize length of keys when merging multiple datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645280#M192873</link>
      <description>This is what happened. I used PROC IMPORT to import excel variables. Because I couldn't figure out the other way around.</description>
      <pubDate>Tue, 05 May 2020 12:55:57 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645280#M192873</guid>
      <dc:creator>Cruise</dc:creator>
      <dc:date>2020-05-05T12:55:57Z</dc:date>
    </item>
    <item>
      <title>Re: Standardize length of keys when merging multiple datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645296#M192877</link>
      <description>&lt;P&gt;Step 1: get rid of Excel files for reading data into SAS. Instead convert the files to CSV or another textual format first. Tools like LibreOffice allow batch-conversion from, say, XLSX to CSV. All this, of course, if your data originates from Excel in the first place, which is unlikely.&lt;/P&gt;
&lt;P&gt;Step 2: read those files with tailored data steps where you have full control over variable attributes.&lt;/P&gt;</description>
      <pubDate>Tue, 05 May 2020 13:47:23 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645296#M192877</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-05-05T13:47:23Z</dc:date>
    </item>
    <item>
      <title>Re: Standardize length of keys when merging multiple datasets</title>
      <link>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645302#M192879</link>
      <description>&lt;P&gt;The many-to-many might be caused by the length issue. Suppose something has an ID code of 12345X, and something else (in a later run) 12345XY. Since after the first import the code will be limited to a length of 6, the Y will be cut off when merging. Like&lt;/P&gt;
&lt;PRE&gt;&lt;CODE class=" language-sas"&gt;data have1;
input ID $5.;
datalines;
12345X
12345X
;

data have2;
input ID $6.;
datalines;
12345X
12345XY
;

data want;
merge
  have1
  have2
;
by id;
run;&lt;/CODE&gt;&lt;/PRE&gt;</description>
      <pubDate>Tue, 05 May 2020 14:07:55 GMT</pubDate>
      <guid>https://communities.sas.com/t5/SAS-Programming/Standardize-length-of-keys-when-merging-multiple-datasets/m-p/645302#M192879</guid>
      <dc:creator>Kurt_Bremser</dc:creator>
      <dc:date>2020-05-05T14:07:55Z</dc:date>
    </item>
  </channel>
</rss>

