I want to merge 20 datasets with the same variable names - three character variables and one numeric.
When execute this code:
data COMB;
set COMB01 COMB02 COMB03 COMB04 COMB06 COMB07 COMB08 COMB09 COMB11 COMB12 COMB13 COMB14 COMB16 COMB17 COMB18 COMB19 COMB21 COMB22 COMB23 COMB24;
run;
I get duplicate registers from the characters variables with different and incorrect values in the numeric variable.
Is 20 datasets too much for this comand?
The right solution is to go back to the steps that created the 20 input datasets and make sure that you define the variables consistently. For example instead of using PROC IMPORT to read a text file write your own data step so that you have control over how the variables are defined.
But for your simple application with ONLY FOUR VARIABLES you can just fix it in this step by adding a LENGTH statement before the SET statement to define all four of the variables. That will let you define the ORDER, the TYPE and the storage LENGTH of each variable. You migth want to also set the FORMAT attached to the variable to avoid the problem that the character values look truncated because the display width is shorter than the storage length.
Let's assume the other two variables are numeric then the code might look like this
data COMB;
length var1 8 ACTIVITY $30 COUNTERPART_AREA $50 var2 8 ;
set COMB01-COMB09 COMB11-COMB14 COMB16-COMB19 COMB21-COMB24;
format var2 date9. var1 activity counterpart ;
run;
PS What you are doing is not "merging" the datasets. You are just concatenating (or appending) the datasets.
The code you posted should work fine, if the datasets have the same variables defined the same way.
You can simplify the code by using dataset lists.
data COMB;
set COMB01-COMB09 COMB11-COMB14 COMB16-COMB19 COMB21-COMB24;
run;
No idea what a "duplicate registers" means. Can you share the SAS log for this step with all of the notes/warnings/errors that SAS generates?
Here it is the log
766 /*Junta todas as COMB*/
767 data COMB;
768 set COMB01 COMB02 COMB03 COMB04 COMB06 COMB07 COMB08 COMB09 COMB11 COMB12 COMB13 COMB14 COMB16 COMB17 COMB18 COMB19
768 ! COMB21 COMB22 COMB23 COMB24;
769 run;
WARNING: Multiple lengths were specified for the variable ACTIVITY by input data set(s). This can cause truncation of data.
WARNING: Multiple lengths were specified for the variable COUNTERPART_AREA by input data set(s). This can cause truncation of data.
NOTE: There were 165 observations read from the data set WORK.COMB01.
NOTE: There were 330 observations read from the data set WORK.COMB02.
NOTE: There were 330 observations read from the data set WORK.COMB03.
NOTE: There were 165 observations read from the data set WORK.COMB04.
NOTE: There were 330 observations read from the data set WORK.COMB06.
NOTE: There were 660 observations read from the data set WORK.COMB07.
NOTE: There were 638 observations read from the data set WORK.COMB08.
NOTE: There were 231 observations read from the data set WORK.COMB09.
NOTE: There were 22 observations read from the data set WORK.COMB11.
NOTE: There were 44 observations read from the data set WORK.COMB12.
NOTE: There were 44 observations read from the data set WORK.COMB13.
NOTE: There were 22 observations read from the data set WORK.COMB14.
NOTE: There were 11 observations read from the data set WORK.COMB16.
NOTE: There were 22 observations read from the data set WORK.COMB17.
NOTE: There were 22 observations read from the data set WORK.COMB18.
NOTE: There were 11 observations read from the data set WORK.COMB19.
NOTE: There were 3718 observations read from the data set WORK.COMB21.
NOTE: There were 5247 observations read from the data set WORK.COMB22.
NOTE: There were 770 observations read from the data set WORK.COMB23.
NOTE: There were 440 observations read from the data set WORK.COMB24.
NOTE: The data set WORK.COMB has 13222 observations and 4 variables.
NOTE: DATA statement used (Total process time):
32 The SAS System 16:34 Thursday, March 30, 2023
real time 0.01 seconds
cpu time 0.01 seconds
Thank you for your help.
See this post for how to see what the issue is and resolve it.
The right solution is to go back to the steps that created the 20 input datasets and make sure that you define the variables consistently. For example instead of using PROC IMPORT to read a text file write your own data step so that you have control over how the variables are defined.
But for your simple application with ONLY FOUR VARIABLES you can just fix it in this step by adding a LENGTH statement before the SET statement to define all four of the variables. That will let you define the ORDER, the TYPE and the storage LENGTH of each variable. You migth want to also set the FORMAT attached to the variable to avoid the problem that the character values look truncated because the display width is shorter than the storage length.
Let's assume the other two variables are numeric then the code might look like this
data COMB;
length var1 8 ACTIVITY $30 COUNTERPART_AREA $50 var2 8 ;
set COMB01-COMB09 COMB11-COMB14 COMB16-COMB19 COMB21-COMB24;
format var2 date9. var1 activity counterpart ;
run;
PS What you are doing is not "merging" the datasets. You are just concatenating (or appending) the datasets.
How exactly do you know that the numeric variable is "incorrect". Be very explicit and include the FORMAT assigned to the resulting variable.
When asking questions about any result it is best practice to copy the LOG, include the code submitted along with ALL notes, messages, warnings or errors. On the forum open a text box using the </> icon above the message and paste all of that text. That way we don't have to ask a bunch of questions that the log answers. The text box is important to preserve the formatting of any diagnostic messages that SAS often includes as the main message windows on this forum reformat text when pasted.
Numeric values may be more problematic depending on the original values and how they were read though typically the main issue is going to be the format assigned. If you are seeing things like 1E10 or 1E-12, that is exponential notation and occurs when the value is large or very small but the default format assigned to the variable won't display all of the digits.
An example of how the format affects the appearance:
data _null_; x= 10000000000; y= 0.0000000132; file print; put 'Best5 format ' x= best5. y= best5.; put 'F6 format' x= f6. y= f6. ; put 'f8.3 format' x= f8.3 y= f8. ; put 'f28.13 format' x= f28.13 y= f28.13; put 'Best16. format' x= best16. y= best16.; put 'Best32. format' x= best32. y= best32.; run;
The results will show in the Results window.
If you have a value with more than 14 digits make sure that it should actually be a number. If the value is actually something like an identification code or account number there may be issues with things like missing leading zero and rounding of large values due to the precision of numeric storage that SAS uses.
Registration is open! SAS is returning to Vegas for an AI and analytics experience like no other! Whether you're an executive, manager, end user or SAS partner, SAS Innovate is designed for everyone on your team. Register for just $495 by 12/31/2023.
If you are interested in speaking, there is still time to submit a session idea. More details are posted on the website.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.