BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
BlueNose
Quartz | Level 8

Hello all,

 

I have 10 SAS datasets, which I want to combine together (to concatenate). Some variables appear in all files, some only in some. Each file represent a different visit in a clinical trial.

 

I had a problem with variables being numeric in one dataset and character in another. There weren't too many of them, so I dealt with it manually.

 

Now a new problem. Some character variables (usually the ones of the free text), have different length in differnt datasets, so I get lot's of warnings in the log, and my data is truncated.

 

Afte looking online, I tried the following thing, before using the SET statement, I used the length, like this:

 

 

data All;
length Var1 $ 160 Var2 $ 150;
Set DS1 DS2 DS3;
run;

 

 

So now there are no warnings anymore in the log, however, first of all, the data is still being truncated ! Secondly, when looking at the attributes of the specific variables, I see that the length was changed indeed, but the format wasn't, for example: If Var1 had length 109 in DS1 and 160 in DS2, than I do see 160 in the length, but in format I see $109, which is weird.

 

So my quesion is, assuming I have several dataset, with hundreds of variables in each, and assuming that the type is finally matching, how do I concatenate them all, if some character variables have different lengths ?

 

Thank you !

 

1 ACCEPTED SOLUTION

Accepted Solutions
Astounding
PROC Star

While you haven't shown the program/log, I suspect the issue is the positioning of the statement.  Even though the LENGTH statement must come before the SET statement, the FORMAT statement must come after the SET statement.  LENGTH uses the first setting found, but FORMAT uses the last setting found.

View solution in original post

15 REPLIES 15
Reeza
Super User
Your doing it correctly, the format is making it appear as if the data is truncated.

Change the format, similar to how you specified the length

Another trick is to append the data and making sure the base data is the dataset with the longest length/format, which sets the attributes for the final data set.
BlueNose
Quartz | Level 8
Thank you. I thought about using the longest length/format. However, I have 10 datasets, and each variable can have the longest length in a different dataset, it's too many combinations to check.
How do I change the format, let's say, for a length of 142?
RW9
Diamond | Level 26 RW9
Diamond | Level 26

You can do it in a number of ways, the SQL posted below fixes the length in the first select and all other will then apply to that.  You could also query the SASHELP.VCOLUMN to find the biggest if needed.  The question is why do you have 10 datasets with different attributes?  Is this something you have created, or do you get it from someone else?  If its the former then fix it in your input datasets, if its the later check out the data import agreement to see what the data should be like.

Although after checking, this does exactly the same thing:

data want;
  length a $50 b c 8;
  set have1 have2 have3;
run;

So I am now unsure of the question?

 

 

data have1;
  length a $5;
  a="ABC"; b=2; output;
  a="DEFGT"; b=1; output;
run;

data have2;
  length a $25;
  a="23"; b=1; output;
run;

data have3;
  c=1; b=1; output;
run;

proc sql;
  create table WANT as
  select  A length=25,
          B,
          . as C
  from    HAVE1
  union all
  select  *,
          . as C
  from    HAVE2
  union all 
  select  "" as A,
          *
  from    HAVE3;
quit;

 

BlueNose
Quartz | Level 8
This is how I got the data, I can't control it, the problem comes from free text fields, sometimes over 100 characters. I have 10 datasets with hundreds variables in each, I can't control it. I used PROC Import and told SAS to use maximum variables to define format and length. I see no other way to read the data into SAS, so now I have to "clean" it.
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Ah, so its not actually dataset we are talking about then, I am presuming Excel sheets yes?  Common problem.  Get the data in CSV, i.e. plain text comma separated variable, if you have to, do it yourself - Excel -> SaveAs and select CommaSeparated.  Now for each of the files (or you may be able to get away with one) write a basic data step import:

data mydata;
  infile "c:\abc.csv" dlm=",";
  length a $10 b $5...;
  input a $ b;
run;

What this does is read in the data correctly, proc import is a "guessing" procedure, it is trying to guess what you want to do which is never a good thing, and you end up with this length nonsense.  Specify the lengths yourself, if you don't have an import agreement then fix lengths as 200.

BlueNose
Quartz | Level 8

You are right, Excel sheets, and I did convert to CSV. I cannot use your way, because I have hundreds of variables in each file, so I will need to define them all in the length and input statements. Or am I wrong, won't I  have to do that?

RW9
Diamond | Level 26 RW9
Diamond | Level 26

That depends on the files themselves, i.e. what the variables are.  Is there a fixed number, are they all the same type - things that should be in a data import agreement.  

The problem is if you don't look at it at the import stage, your going to have to look at it somewhere - i.e. your having trouble now with lengths, what about datatypes later on, processing on the data, ...

Astounding
PROC Star

Worth testing:  I believe if you use SQL to combine the data sets it will automatically solve the length issues.  Not sure about the formats, but it's definitely worth trying.

 

There isn't much reason to apply a format to a lengthy character variable.  The best solution for conflicting formats in those cases is probably just to remove the format entirely:

 

format longvar;

 

Good luck.

BlueNose
Quartz | Level 8

Astounding, thank you. Your comment is interesting. What does format longvar does ? Will it work if my files contains both numerical variables AND characteristic variables with varying lenghts ?

 

Secondly, how do I append a few datasets (let's say 3 or 4 for the example) using SQL ?

Astounding
PROC Star

The FORMAT statement is intended as a DATA step statement.  When you don't actually supply a format name in a FORMAT statement, it's asking that the format be removed.

 

format var1 var2 var3;

 

This statement asks that any formats for VAR1, VAR2, and VAR3 be removed.

 

Others are better positioned than myself to show you how to concatenate data sets in SQL.

 

Good luck.

BlueNose
Quartz | Level 8

Unfortunatelly something is wrong, I tried adding the format statement with the variable name and not specifying a format, and again, the length is now 200, and the format is $109. The data is truncated.

Astounding
PROC Star

While you haven't shown the program/log, I suspect the issue is the positioning of the statement.  Even though the LENGTH statement must come before the SET statement, the FORMAT statement must come after the SET statement.  LENGTH uses the first setting found, but FORMAT uses the last setting found.

BlueNose
Quartz | Level 8
You are right ! I moved the format to after the set, and now the format is $200, and my data is no longer truncated !

The informat is still 109, does it matter ?
Astounding
PROC Star

It's unlikely to hurt, as you are unlikely to add observations at this point.  Still, it would be safer to remove the  informats.  If you have a FORMAT statement that removes formats, it's should be easy to construct an INFORMAT statement as well.

hackathon24-white-horiz.png

The 2025 SAS Hackathon has begun!

It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.

Latest Updates

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 15 replies
  • 7174 views
  • 7 likes
  • 5 in conversation