BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
Cruise
Ammonite | Level 13

Hi Folks: 

I'm merging more than twenty survey datasets together with different sources. Is there any way to standardize the length of ID1NAME & IDNAME across all datasets efficiently? Specifying the length of variables for each datasets one by one would be tedious. 

Any hints are appreciated.

 

data alldata_id; /*259*/ 
merge m.reference_columns
Apt_rent_id Car_per_pp_id 
Hh_type_id N_houses_id N_manufact_id 
N_service_comp_id N_students_per_class_id  
N_welfare_facility_id Ratio_health_welfare_id 
Urbanicity_id Workers_per_pp_id Disability_id
Gdp_id Insured_pop_id N_employees_id 
Retail_workers_id Vac_rate_id;
by id1name idname;
run;

But i got the error below:

WARNING: Multiple lengths were specified for the BY variable ID1NAME by input data sets. This
         might cause unexpected results.
WARNING: Multiple lengths were specified for the BY variable IDNAME by input data sets. This
         might cause unexpected results.
NOTE: MERGE statement has more than one data set with repeats of BY values.
NOTE: There were 250 observations read from the data set M.REFERENCE_COLUMNS.
NOTE: There were 172 observations read from the data set WORK.APT_RENT_ID.
NOTE: There were 245 observations read from the data set WORK.CAR_PER_PP_ID.
NOTE: There were 245 observations read from the data set WORK.HH_TYPE_ID.
NOTE: There were 320 observations read from the data set WORK.N_HOUSES_ID.
NOTE: There were 245 observations read from the data set WORK.N_MANUFACT_ID.
NOTE: There were 245 observations read from the data set WORK.N_SERVICE_COMP_ID.
NOTE: There were 245 observations read from the data set WORK.N_STUDENTS_PER_CLASS_ID.
NOTE: There were 245 observations read from the data set WORK.N_WELFARE_FACILITY_ID.
NOTE: There were 245 observations read from the data set WORK.RATIO_HEALTH_WELFARE_ID.
NOTE: There were 244 observations read from the data set WORK.URBANICITY_ID.
NOTE: There were 245 observations read from the data set WORK.WORKERS_PER_PP_ID.
NOTE: There were 249 observations read from the data set WORK.DISABILITY_ID.
NOTE: There were 228 observations read from the data set WORK.GDP_ID.
NOTE: There were 250 observations read from the data set WORK.INSURED_POP_ID.
NOTE: There were 228 observations read from the data set WORK.N_EMPLOYEES_ID.
NOTE: There were 228 observations read from the data set WORK.RETAIL_WORKERS_ID.
NOTE: There were 264 observations read from the data set WORK.VAC_RATE_ID.
NOTE: The data set WORK.ALLDATA_ID has 383 observations and 39 variables.
NOTE: DATA statement used (Total process time):
      real time           0.04 seconds
      cpu time            0.03 seconds
1 ACCEPTED SOLUTION

Accepted Solutions
Kurt_Bremser
Super User

Step 1: get rid of Excel files for reading data into SAS. Instead convert the files to CSV or another textual format first. Tools like LibreOffice allow batch-conversion from, say, XLSX to CSV. All this, of course, if your data originates from Excel in the first place, which is unlikely.

Step 2: read those files with tailored data steps where you have full control over variable attributes.

View solution in original post

7 REPLIES 7
ChrisNZ
Tourmaline | Level 20

> Is there any way to standardize the length of ID1NAME & IDNAME ... would be tedious

Short answer: No there's no non-tedious way.

You have a data quality problem and the source data should be painstakingly standardised by hand.

 

There is a lazy way out though: Add something like

 

length id1name idname $20;

 

at the top of data step. 

This does not replace vetting and cleaning the source data.

SASKiwi
PROC Star

@Cruise  - You should also take note of this:

NOTE: MERGE statement has more than one data set with repeats of BY values.

That means more than one of your input datasets has duplicate BY variable values. The MERGE statement cannot handle this, so the duplicate rows on your second and subsequent datasets will be dropped. Is this what you want?

Cruise
Ammonite | Level 13
Thank you so much for the pointer. This means I have to investigate the source of duplicates. Is there a way to know which particular datasets had duplicates that needs my investigation in the data merge process? How did you know that there were duplicates? These are invaluable info to me.
Kurt_Bremser
Super User

The many-to-many might be caused by the length issue. Suppose something has an ID code of 12345X, and something else (in a later run) 12345XY. Since after the first import the code will be limited to a length of 6, the Y will be cut off when merging. Like

data have1;
input ID $5.;
datalines;
12345X
12345X
;

data have2;
input ID $6.;
datalines;
12345X
12345XY
;

data want;
merge
  have1
  have2
;
by id;
run;
Kurt_Bremser
Super User

THE way for treating this is to fix your import process(es). Variable attributes must be standardized along their documented values when the data is read into SAS. Do NOT use PROC IMPORT and/or Excel files.

Cruise
Ammonite | Level 13
This is what happened. I used PROC IMPORT to import excel variables. Because I couldn't figure out the other way around.
Kurt_Bremser
Super User

Step 1: get rid of Excel files for reading data into SAS. Instead convert the files to CSV or another textual format first. Tools like LibreOffice allow batch-conversion from, say, XLSX to CSV. All this, of course, if your data originates from Excel in the first place, which is unlikely.

Step 2: read those files with tailored data steps where you have full control over variable attributes.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 1164 views
  • 7 likes
  • 4 in conversation