Solved: Re: Merge

robertrao · Posted 10-18-2012 02:35 PM

Hi,

I have 4 datasets created after running a macro.

All 4 have 2 variables.

category value

I want to merge by CATEGORY.and rename the value in each of the 4 datasets so that all of them are side by side??

Regards

Reeza · Posted 10-18-2012 04:07 PM

My guess is you're using the macro to create a table of characteristics for different groups, ie control and intervention if it was a clinical trial.

My suggestion is to build that capability into the macro rather than modify the end results. Though what I've done, when in a rush is output it all to excel and merge it manually (eyeballing) it.

It needs to get added in to a manuscript usually so does require doing some manual work anyways.

View solution in original post

Haikuo · Posted 10-18-2012 02:41 PM

how about :

data want;

merge h1(rename=value=value1) h2(rename=value=value2) h3(rename=value=value3) h4(rename=value=value4);

by category;

run;

Haikuo

robertrao · Posted 10-18-2012 02:49 PM

That was a quick response. I got the answer I wanted because i was wanting to it 4 different steps.

But I have a small issue . I donot want to disturb the order of the category while I merge. So I merge with out the sorting of the cotegorical variable

I get this error!!!1

3093 data combine;
3094 merge STATC_OUT(rename=value=value1) STATQ_OUT(rename=value=value2)
3094! STATCQ_OUT(rename=value=value3) statCQN_OUT(rename=value=value4);
3095 by categorical;
3096 run;

ERROR: BY variables are not properly sorted on data set WORK.STATC_OUT.
categorical=Range value1=16.0 - 77.0 value2=16.0 - 72.0 value3=16.0 - 77.0 value4=2.0 - 86.0
FIRST.categorical=1 LAST.categorical=1 _ERROR_=1 _N_=5

robertrao · Posted 10-18-2012 02:59 PM

i have something like this(showing the first dataset here)

categorical value

AGE

count(missing) 800

mean 20.3

Range 13

SEX

Female 500

Male 300

RACE

aSIAN 400

HISPANIC 400

Haikuo · Posted 10-18-2012 03:24 PM

You have to sort them before using 'merge by'. proc sql does not require presorting, but 1.) it is different to data step merge, so it may not produce what you want. 2) It is still not guarantee the order unless you using monotonic() in one of the inputing datasets.

So for your problem, if you are certain that all 4 of your incoming tables have exact structure row by row, same amount of total obs and same amount of obs for every category, and at the same order, simply put they are the same tables except the value contents, then you can use:

1. merge without by variable

2. set.

Haikuo

Astounding · Posted 10-18-2012 03:27 PM

Yes, or to put it another way ...

If thy BY offends thee, pluck it out.

It sounds like all 4 data sets have the same categories in the same order. So suggestion #1 seems appropriate.

robertrao · Posted 10-18-2012 03:34 PM

Hiiiiiiiiii

I went for it without the BY.

Unfortunately, for example all 106 had no AIDS, so AIDS+ row is eliminated in one dataset.Maybe in the other some people had AIDS and that adds to an extra row which is messing up things!! Any suggestions????

Categorical value

AIDS

AIDS- 106

AIDS+

In other words 3 data sets of the 4 are having 55 observations but the odd one has 49. Thats messing up the merge

Astounding · Posted 10-18-2012 03:49 PM

Unfortunately, you have theoretically defined a problem without a unique solution. Look at this possiblity for CATEGORICAL:

dataset 1:

AIDS

AIDS-

dataset 2:

AIDS-

AIDS+

dataset 3:

AIDS+

What would the proper order be, when you combine them? There are a few possibilities, which means the computer shouldn't decide ... you will have to. One approach would create a format, such as:

value $order

'AIDS'='01'

'AIDS+'='02'

...

Typing this out would be tedious, but you don't need to type it out if you would like to designate one of your "complete" data sets as having the "proper" order. Use that data set to define the order:

data proper_order;

set dataset_1;

retain fmtname '$order';

start=categorical;

label = put(_n_, z3.); /* just in case you have more than 100 categories one day */

run;

proc format cntlin=proper_order;

run;

Then apply the format, assigning an order variable to each data set. Illustrating with just dataset_4:

data dataset_4;
set dataset_4;

order = put(categorical, $order.);

run;

Finally, sort and merge BY ORDER.

Good luck.

Reeza · Posted 10-18-2012 03:52 PM

Look into adding the sparse option in the proc freq, that way you should have an option for each group

Or look at changing the macro to accommodate by groups on a different level, ie with each result before it stacks them together. I'd started that a bunch of times, but never had a chance to complete it.

robertrao · Posted 10-18-2012 04:00 PM

BTW I am using the macro you provided me. I will look into the sparse option of the proc freq and see

I dint understand your second sentence.

Regards

Reeza · Posted 10-18-2012 04:07 PM

My guess is you're using the macro to create a table of characteristics for different groups, ie control and intervention if it was a clinical trial.

My suggestion is to build that capability into the macro rather than modify the end results. Though what I've done, when in a rush is output it all to excel and merge it manually (eyeballing) it.

It needs to get added in to a manuscript usually so does require doing some manual work anyways.

robertrao · Posted 10-18-2012 04:36 PM

Hi,

In the macro for proc freq I tried to incorporate the SPARSE option and it does nt work for me...

Reeza · Posted 10-18-2012 05:01 PM

What version of SAS do you have?

robertrao · Posted 10-18-2012 05:27 PM

9.1

robertrao · Posted 10-19-2012 01:09 AM

Hi,

I went to export each of the 4 datasets and merged manually in Excel and then imported Back.

Thanks for the help

SAS Innovate 2025: Save the Date