DATA Step, Macro, Functions and more

concatenating two data sets

Reply
Frequent Contributor
Posts: 126

concatenating two data sets

Hi, 

I'm trying the combine two datasets sequentially but I got issues bc of the variables length were different the data sets combined but the problem when I tried to print out the result I got some issues. 

this is the error that I got 

 
ERROR: Invalid characters were present in the data.
ERROR: An error occurred while processing text data.

 

I will copy the log and the contents of each dataset  

 

 
 
 
189
190 data mm1;
191 set oricclose harmclose_O;
192 run;
 
WARNING: Multiple lengths were specified for the variable Query_Title by input data set(s). This can cause truncation of data.
WARNING: Multiple lengths were specified for the variable Verbatim_Terms by input data set(s). This can cause truncation of data.
WARNING: Multiple lengths were specified for the variable Preferred_Terms by input data set(s). This can cause truncation of data.
WARNING: Multiple lengths were specified for the variable Event_Start_Date by input data set(s). This can cause truncation of data.
WARNING: Multiple lengths were specified for the variable Initiated_User by input data set(s). This can cause truncation of data.
WARNING: Multiple lengths were specified for the variable Last_Query_Entry by input data set(s). This can cause truncation of data.
NOTE: There were 1083 observations read from the data set WORK.ORICCLOSE.
NOTE: There were 677 observations read from the data set WORK.HARMCLOSE_O.
NOTE: The data set WORK.MM1 has 1760 observations and 21 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
memory 4971.28k
OS Memory 36784.00k
Timestamp 05/09/2018 01:46:16 PM
Step Count 177 Switch Count 2
Page Faults 0
Page Reclaims 860
Page Swaps 0
Voluntary Context Switches 11
Involuntary Context Switches 0
Block Input Operations 0
Block Output Operations 4104
 
 
193
194 proc print data=mm1; run;
 
ERROR: Invalid characters were present in the data.
ERROR: An error occurred while processing text data.
NOTE: The SAS System stopped processing this step because of errors.
NOTE: There were 1760 observations read from the data set WORK.MM1.
NOTE: PROCEDURE PRINT used (Total process time):
real time 1.82 seconds
user cpu time 1.82 seconds
system cpu time 0.00 seconds
memory 4050.43k
OS Memory 35240.00k
Timestamp 05/09/2018 01:46:18 PM
Step Count 178 Switch Count 0
Page Faults 0
Page Reclaims 839
Page Swaps 0
Voluntary Context Switches 0
Involuntary Context Switches 61
Block Input Operations 0
Block Output Operations 600
 
 
oricclose contents 
 
Alphabetic List of Variables and Attributes# Variable Type Len Format Informat Label141619713201012181711691231548521
Assigned_UserChar43$43.$43.Assigned User
CategoryChar19$19.$19.Category
Closed_DateNum8YYMMDD10. Closed Date
CountryChar32$32.$32.Country
Created_DateNum8YYMMDD10. Created Date
Days_OpenChar15$15.$15.Days Open
Event_Start_DateChar71$71.$71.Event Start Date
Initiated_UserChar24$24.$24.Initiated User
Last_Action_DateNum8YYMMDD10. Last Action Date
Last_Query_EntryChar423$423.$423.Last Query Entry
LocationChar15$15.$15.Location
PAC_NoChar17$17.$17.PAC No
Preferred_TermsChar124$124.$124.Preferred Terms
Query_IDChar15$15.$15.Query ID
Query_TitleChar53$53.$53.Query Title
Site_NumberChar15$15.$15.Site Number
StatusChar15$15.$15.Status
Subject_NumberChar22$22.$22.Subject Number
Verbatim_TermsChar205$205.$205.Verbatim Terms
Visit_NameChar15$15.$15.Visit Name
protocolChar15   
.HARMCLOSE_O contents  Alphabetic List of Variables and Attributes# Variable Type Len Format Informat Label141619713201012181711691231548521
Assigned_UserChar43$43.$43.Assigned User
CategoryChar15$15.$15.Category
Closed_DateNum8DATE9. Closed Date
CountryChar25$25.$25.Country
Created_DateNum8DATE9. Created Date
Days_OpenChar15$15.$15.Days Open
Event_Start_DateChar239$239.$239.Event Start Date
Initiated_UserChar25$25.$25.Init User
Last_Action_DateNum8DATE9. Last Action Date
Last_Query_EntryChar1072$1072.$1072.Last Query Details
LocationChar15$15.$15.Location
PAC_NoChar15$15.$15.PAC No
Preferred_TermsChar914$914.$914.Preferred Terms
Query_IDChar15$15.$15.Query ID
Query_TitleChar254$254.$254.Query Title
Site_NumberChar15$15.$15.Site Number
StatusChar15$15.$15.Status
Subject_NumberChar22$22.$22.Subject Number
Verbatim_TermsChar741$741.$741.Verbatim Terms
Visit_NameChar15$15.$15.Visit Name
protocolChar15   
 

 
SAS Super FREQ
Posts: 9,257

Re: concatenating two data sets

Hi:
I would fix the length issue first. You can do that by simply using a LENGTH statement BEFORE any SET statement in your code. Make sure that the LENGTH statement declares a size big enough for the longest value.

Then, after you fix the length, work on the invalid character issue. This could be due to encoding issue on your system:
https://communities.sas.com/t5/ODS-and-Base-Reporting/ERROR-Invalid-characters-were-present-in-the-d... or
https://communities.sas.com/t5/SAS-Analytics-U/PROC-ANOVA-ERROR-invalid-characters-were-present-in-t... or
https://communities.sas.com/t5/General-SAS-Programming/French-Accents-in-SAS/td-p/316850

The common issue in all of the above postings was the fact that the data had characters that could not be processed by SAS. So that means you'll need to either work with the people who sent you the data or change your settings/language/encoding to allow you to read the data correctly.

It is possible that the data is in some other encoding, in which case, you might have to find out from the people who sent you the data, what encoding or language or character set to use.

But first, I recommend fixing the length issue.

Cynthia
Frequent Contributor
Posts: 126

Re: concatenating two data sets

Posted in reply to Cynthia_sas

I use the same datasets in another program and I never had this error but when I try to combine them I got this error 

 

SAS Super FREQ
Posts: 9,257

Re: concatenating two data sets

Then you have to run PROC CONTENTS on each dataset separately to resolve the difference in lengths. If you compare the variables, you should find that the variables listed in the log have different lengths in the files you are bringing together. The message you see is just a WARNING that possible truncation could occur. You do NOT have to fix the lengths, if you can live with the WARNING messages. However, it is a best practice to fix the length issue to avoid the WARNING.

For the character issue, without actually seeing the data or understanding what the exact cause of the invalid character message is, it is impossible to comment. You might open a track with Tech Support on the invalid character issue but you will need to be able to send them your data and help them figure out things like your language setting and encoding setting.

Cynthia
Super User
Posts: 13,034

Re: concatenating two data sets


@mona4u wrote:

I use the same datasets in another program and I never had this error but when I try to combine them I got this error 

 


Statements like:

WARNING: Multiple lengths were specified for the variable Query_Title by input data set(s). This can cause truncation of data..

are not errors. They are telling you that may have data issues after combining the data.

 

A brief fairly obvious result in this demonstration.

data work.data1;
   x="Short text";
run;
data work.data2;
   x="Some much longer text in this field";
run;

data work.combined;
   set work.data1
       work.data2
   ;
run;
proc print data=work.combined;
run;

which shows that the value of X was truncated when combined this way. The correction as @Cynthia_sas is easy:

 

data work.combined;
   /* the 35 below comes from the length of X in data2*/
   length x $ 35; 
   set work.data1
       work.data2
   ;
run;
proc print data=work.combined;
run;

If the data sets have a different encoding then issue of invalid characters might not appear until attempting to combine the data as a single data set can only have one type of encoding.

 

Super User
Posts: 22,848

Re: concatenating two data sets

It happens when you're dealing with a text variable with specific characters that SAS doesn't recognize. 

 

I seem to get it more in SAS UE than on my desktop edition. 

 

If you use the exact same data set but don't work with that variable the error does not appear. Or if you filter out the offending records you don't get the issue. 

 

I usually get this when I'm working with the Canadian Census data and the french names have the extra symbols that are problematic. 

Ask a Question
Discussion stats
  • 5 replies
  • 92 views
  • 2 likes
  • 4 in conversation