Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

Find a Community

- Home
- /
- SAS Programming
- /
- General Programming
- /
- Concatenating issues

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-14-2013 02:07 PM

friends, can you explain me the difference between these two concatenating methods with an example.

1) data test;

set a b;

run;

2) data test;

set a;

set b;

run;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-14-2013 02:25 PM

Run this;

data a;

do i=1 to 10;

output;

end;

run;

data b;

do j=1 to 10;

output;

end;

run;

data long;

set a b;

run;

data wide;

set a;

set b;

run;

And tell us what you think the difference are.

Haikuo

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2013 02:08 AM

Dear Hai.kuo , not getting the clear picture from the above example.. can you please explain if possible..

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2013 02:35 AM

Hi,

Please check the datasets created by the examples given by Haikuo. The long dataset shows that two datasets are concatenated one below the other (vertical), so that number of observations are more, however in the second example wide the datasets are concatenated horizontally. due to which the number of observations in the output will be same, the number of obs that will be output depend upon the number of obs in both the datasets, and only equal or same number of obs will in the output.

Thanks,

Jagadish

Thanks,

Jag

Jag

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-15-2013 07:10 AM

HI Rohit..

in above example given by Hai.kuo,Datasets long and wide clearly tell the difference between to approaches.

If you use **Set A B ,**

Datasets are concatenated vertically.which means Total Number of observations in final dataset is equals to (Number of Obs in A +Number of Obs in B).

But in the Case of

** Set A; **

** Set B; **

** **

Datasets are concatenated horizontally(side by side).

Total Number of observations in final dataset is equals to (Maximum number of Obs From both datasets).

Simple merge will do same thing.So you will get same output with merge statement like.

Data Final;

Merge A B;

run;

For above code ,sorting not required and no need to use by variable also.this will blindly merge first obs from A dataset with first obs from B dataset ,Irrespective of matching values.

Regards.

Sanjeev.K

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Highlight
- Email to a Friend
- Report Inappropriate Content

03-16-2013 12:05 AM

Hi

data wide;

set A;

set B;

run;

data wide;

merge A B;

run;

The above two code are not the same. The current data provided, they are producing the same output. But actually they are different.

I tried with a different data and the results are as below

data a;

do i=1 to 12;

output;

end;

run;

data b;

do j=2 to 20 by 2;

output;

end;

run;

data long;

set a b;

run;

data wide;

set a;

set b;

run;

data wide_;

merge a b;

run;

please find below the log details

**25 data a;**

**26 do i=1 to 12;**

**27 output;**

**28 end;**

**29 run;**

**NOTE: The data set WORK.A has 12 observations and 1 variables.**

**NOTE: DATA statement used (Total process time):**

** real time 0.07 seconds**

** cpu time 0.03 seconds**

**30**

**31 data b;**

**32 do j=2 to 20 by 2;**

**33 output;**

**34 end;**

**35 run;**

**NOTE: The data set WORK.B has 10 observations and 1 variables.**

**NOTE: DATA statement used (Total process time):**

** real time 0.01 seconds**

** cpu time 0.01 seconds**

**36**

**37 data long;**

**38 set a b;**

**39 run;**

**NOTE: There were 12 observations read from the data set WORK.A.**

**NOTE: There were 10 observations read from the data set WORK.B.**

**NOTE: The data set WORK.LONG has 22 observations and 2 variables.**

**NOTE: DATA statement used (Total process time):**

** real time 0.03 seconds**

** cpu time 0.03 seconds**

**40**

**41 data wide;**

**42 set a;**

**43 set b;**

**44 run;**

**NOTE: There were 11 observations read from the data set WORK.A.**

**NOTE: There were 10 observations read from the data set WORK.B.**

**NOTE: The data set WORK.WIDE has 10 observations and 2 variables.**

**NOTE: DATA statement used (Total process time):**

** real time 0.04 seconds**

** cpu time 0.01 seconds**

**45**

**46 data wide_;**

**47 merge a b;**

**48 run;**

**NOTE: There were 12 observations read from the data set WORK.A.**

**NOTE: There were 10 observations read from the data set WORK.B.**

**NOTE: The data set WORK.WIDE_ has 12 observations and 2 variables.**

**NOTE: DATA statement used (Total process time):**

** real time 0.03 seconds**

** cpu time 0.03 seconds**

if the above code is executed, you will find that dataset A has 12 observations and dataset B has 10 observations.

Now if you used the code **set A B; **it procduced 22 observations, this is because the code is concatenating the two dataset vertically.

if you use the code** set A; set B;** then it produced wide dataset with only 10 obs. There is another thing to be noticed, let there be any number of observations in the first dataset, while compiling, only one observation greater than the next dataset will be considered. See the log, in dataset A there should be 12 observations, however only 11 observations are read since there are only 10 observation in the next dataset. Not sure why sas is doing it like that. i read the below article regarding this

http://www2.sas.com/proceedings/forum2008/167-2008.pdf

and found that set A; set B; is overlapping one obs over the other. But during compilation, the number of observation gets fixed in the internal memory. so we get this output.

Coming to the **merge A B** code, if you see the 12 observations is the output, the other dataset is just horizontally concatenated. However the maximum number of observation in the column will remain the same, there is no reduction.

So there is a difference in both the codes as shown above.

Hope this helps

Thanks,

Jagadish

Thanks,

Jag

Jag