DATA Step, Macro, Functions and more

One question about GROUPFORMAT option in MERGE...BY...

Reply
Contributor
Posts: 23

One question about GROUPFORMAT option in MERGE...BY...

Hi everyone,

Please take a look at the following excerpt of log:

Log

23   data test;

24       name="JEFFREY "; n=1; name1=name; output;

25       name="jEFFREY "; n=2; name1=name; output;

26   run;

NOTE: The data set WORK.TEST has 2 observations and 3 variables.

NOTE: DATA statement used (Total process time):

      real time           0.01 seconds

      cpu time            0.01 seconds

27

28   data test1;

29       merge test sashelp.class;

30       by name groupformat;

31       format name $upcase.;

32       if upcase(name)="JEFFREY" then put _all_;

33   run;

name=JEFFREY n=1 name1=JEFFREY Sex=  Age=. Height=. Weight=. FIRST.name=1 LAST.name=1 _ERROR_=0 _N_=6

name=JEFFREY n=2 name1=jEFFREY Sex=M Age=13 Height=62.5 Weight=84 FIRST.name=1 LAST.name=1 _ERROR_=0 _N_=10

NOTE: There were 2 observations read from the data set WORK.TEST.

NOTE: There were 19 observations read from the data set SASHELP.CLASS.

NOTE: The data set WORK.TEST1 has 20 observations and 7 variables.

NOTE: DATA statement used (Total process time):

      real time           0.04 seconds

      cpu time            0.04 seconds


When using GROUPFORMAT option in BY statement and the $UPCASE. format on NAME, I except both value "JEFFREY" and "jEFFREY" can match with "Jeffrey".  However, "JEFFREY" failed to match "Jeffrey".

Would anyone please help me understand why this happens? I wonder what's going on in SAS behind the screen.

Thanks a lot in advance!

Best regards,

Regular Contributor
Posts: 195

Re: One question about GROUPFORMAT option in MERGE...BY...

Hi,

You have used the FORMAT statement after BY statement for which SAS is not having any impact of using GROUPFORMAToption in BY statement...To allow SAS to use the formated value of BY variables, you need to use the FORMAT statement before you use GROUPFORMAT option in BY statement...

data test1;

   format name $upcase.;

   merge test sashelp.class;

   by name groupformat;

   if upcase(name)="JEFFREY" then put _all_;

run;

I have not tried this in SAS, but should work...

-Urvish

Contributor
Posts: 23

Re: One question about GROUPFORMAT option in MERGE...BY...

Posted in reply to UrvishShah

Thanks for your response, but it's not because of the FORMAT statement.

Log:

4632  data test1;

4633     format name $upcase.;

4634     merge test sashelp.class;

4635     by name groupformat;

4636     if upcase(name)="JEFFREY" then put _all_;

4637  run;

name=JEFFREY n=1 name1=JEFFREY Sex=  Age=. Height=. Weight=. FIRST.name=1 LAST.name=1 _ERROR_=0 _N_=6

name=JEFFREY n=2 name1=jEFFREY Sex=M Age=13 Height=62.5 Weight=84 FIRST.name=1 LAST.name=1 _ERROR_=0 _N_=10

NOTE: There were 2 observations read from the data set WORK.TEST.

NOTE: There were 19 observations read from the data set SASHELP.CLASS.

NOTE: The data set WORK.TEST1 has 20 observations and 7 variables.

NOTE: DATA statement used (Total process time):

      real time           0.08 seconds

Regular Contributor
Posts: 195

Re: One question about GROUPFORMAT option in MERGE...BY...

I think you don't need to use GROUPGORMAT option...Join two datasets by using proc sql

data test;

  name = "JEFFERY"; n = 1; name1 = name; output;

  name = "jEFFERY"; n = 2; name1 = name; output;

run;

proc sql;

  create table both as

  select a.*,b.n

  from sashelp.class as a, test as b

  where propcase(a.name) = propcase(b.name);

quit;

If still your requirement is not meeting then request you to post proc format code that you have used to create your $upcase format...And also post output you want by combining datasets insted of SAS Log notes...

Thanks,

Urvish

Contributor
Posts: 23

Re: One question about GROUPFORMAT option in MERGE...BY...

Posted in reply to UrvishShah

Thank you Urvish!

And I know we can use PROC SQL and some other ways to do the merge.

My real purpose is not to merge two data sets to get a certain result. My purpose is to get to know why the GROUPFORMAT + FORMAT failed to generate the expected result. There should be something that I didn't correctly understand.

$UPCASE is an internal format in SAS.

Thanks again!

Regular Contributor
Posts: 195

Re: One question about GROUPFORMAT option in MERGE...BY...


Hi,

Good Morning if there is morning...

In your main post, you wrote -

name=JEFFREY n=1 name1=JEFFREY Sex=  Age=. Height=. Weight=. FIRST.name=1 LAST.name=1 _ERROR_=0 _N_=6

name=JEFFREY n=2 name1=jEFFREY Sex=M Age=13 Height=62.5 Weight=84 FIRST.name=1 LAST.name=1 _ERROR_=0 _N_=10

Here, name1 variable has not associated any format say $upcase.

And as per your code, you only associate the format to NAME  variable and used as primary key during merging process...You have not used NAME1  variable as key variable for merging the datasets...

If my understanding about SAS backend process is in line with you then use the following programme...

data one;

  format name $upcase.;

  name = "JEFFREY"; n = 1; name1 = name; output;

  name = "jEFFREY"; n = 2; name2 = name; output;

data two;

  format name $upcase.;

  set sashelp.class;

run;

proc sort data = one;

  by name;

run;

proc sort data = two;

  by name;

run;

data both;

  format name name1 $upcase.;

  merge one(in = a) two(in = b);

  by name GROUPFORMAT;

  if upcase(name) = "JEFFREY" then put _all_;

run;

You have not associated the format to name1 variable and SAS used the formated values of variable for BY group processing and that is the reason we get excate match for NAME variable during merging...

Hope this time you get your result...

Regards,

Urvish

 

Contributor
Posts: 23

Re: One question about GROUPFORMAT option in MERGE...BY...

Posted in reply to UrvishShah

Hi Urvish:

Good morning and thanks for your reply.

We know that in sashelp.class the value is "Jeffrey" for NAME.

The purpose of using variable NAME1 is to store the unformatted value of name, and show in the log which value ('jEFFREY') can be successfully merged to 'Jeffrey' and which ('JEFFREY') cannot.

name=JEFFREY n=1 name1=JEFFREY Sex=  Age=. Height=. Weight=. FIRST.name=1 LAST.name=1 _ERROR_=0 _N_=6

name=JEFFREY n=2 name1=jEFFREY Sex=M Age=13 Height=62.5 Weight=84 FIRST.name=1 LAST.name=1 _ERROR_=0 _N_=10

I expected both values can be merged to "Jeffrey" when using GROUPFORMAT + FORMAT, but it turned out wrong.  So I am not looking for a way to merge datasets, I am looking for a reason why this method didn't work.

Thanks!

Regular Contributor
Posts: 195

Re: One question about GROUPFORMAT option in MERGE...BY...

I have tried with your code... and also read in SAS Docs and most of the people uses the GROUPFORMAT in SET statement with NOSORTED option in order to manipulate the FIRST.variable and LAST.variable on the basis of formatted values of BY variable insted of actual values...

But as per your example, you said it is merging without causing any error...so i think someone from the community having good command on SAS Backend process, can give his or her input to solve it...

As per my understanding, while merging, SAS retains the values of BY variable in PDV rather than assign it to missing values at the top of data step during each iteration of data step...And in your first comment it seems like SAS is using formatted values of NAME and assign missing values at the top of the data step...

And in your second example, SAS is retaining values of variable during each iteration of data step...

So it's strange...but will try until i get solution...

-Urvish

Contributor
Posts: 23

Re: One question about GROUPFORMAT option in MERGE...BY...

Posted in reply to UrvishShah

Hi Urvish,

You are right, most of the times GROUPFORMAT would be used with SET statement.  Actually I used GROUPFORMAT for only a few times. I ran into this issue when trying to answer someone else's question about merging data sets.

While merging, variables from data sets would not be set to missing at the beginning of each iteration of data step. Of course, at the beginning of the first iteration, all variables would be initialized as missing when no data has been read from data sets.

Thank you very much for spending time on this!

Super User
Super User
Posts: 7,060

Re: One question about GROUPFORMAT option in MERGE...BY...

You cannot use it to MERGE by formatted values, it will still merge by the actual values.

Really only useful to when setting a single dataset and you want to group based on formatted instead of actual values.

Plus in your example since you are changing the case of some of the values you would also need to add the NOTSORTED keyword or else the data step will complain that "jeffery" comes after "THOMAS" in alphabetic order.  And you cannot use NOTSORTED with more than one dataset.

Contributor
Posts: 23

Re: One question about GROUPFORMAT option in MERGE...BY...

Thanks Tom!

But please try the following example. All observations can merge successfully and no complain about the alphabetic order.

Thanks for your time!

data test;

     input name $8.;

  n=_n_;

  cards;

henry

HENRY

Henry

HENRY

HenRY

HENry

jeffrey

THOMAS

;

data test1;

     format name $upcase.;

     merge test sashelp.class(in=okay);

     by name groupformat;

     if not okay then put "Observation " n "is not okay!" name= $8.;

run;

Super User
Super User
Posts: 7,060

Re: One question about GROUPFORMAT option in MERGE...BY...

Not sure why that one works and this one doesn't.

data one ;

  input id $ @@ ;

  copy = id ;

  format id $upcase. ;

  put (_all_) (=);

cards;

a B b c D e

run;

data two ;

  input id $ @@ ;

  copy2 = id ;

  format id $upcase. ;

  put (_all_) (=);

cards;

A b d

run;

data check;

  merge one (in=in1) two(in=in2)   ;

  by id groupformat ;

  put (_n_ _all_ in1 in2 first.id last.id) (=) ;

run;


_N_=1 id=A copy=a copy2=A in1=1 in2=1 FIRST.id=1 LAST.id=1

_N_=2 id=B copy=B copy2=b in1=1 in2=1 FIRST.id=1 LAST.id=0

_N_=3 id=B copy=b copy2=b in1=1 in2=1 FIRST.id=0 LAST.id=1

ERROR: BY variables are not properly sorted on data set WORK.ONE.

in1=1 in2=0 id=C copy=c copy2=  FIRST.id=0 LAST.id=1 _ERROR_=1 _N_=4

NOTE: The SAS System stopped processing this step because of errors.

NOTE: There were 5 observations read from the data set WORK.ONE.

NOTE: There were 3 observations read from the data set WORK.TWO.

WARNING: The data set WORK.CHECK may be incomplete.  When this step was stopped there were 3 observations and 3 variables.

WARNING: Data set WORK.CHECK was not replaced because this step was stopped.

Contributor
Posts: 23

Re: One question about GROUPFORMAT option in MERGE...BY...

Yeah, this is totally beyond my understanding.

BTW, there will be no ERROR if you add a 'd' in your "data one;".

data one ;

  input id $ @@ ;

  copy = id ;

  format id $upcase. ;

  put (_all_) (=);

cards;

a  B b c d D e

run;

Thanks again!

Regular Contributor
Posts: 195

Re: One question about GROUPFORMAT option in MERGE...BY...

Hi,

I tried this example but i got ERROR about BY variables are not properly sorted...

As per the documentation, if you are using GROUPFORMAT option in BY statement, you need to sort or index the dataset first...

SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition

I hope this helps you clear doughts..

-Urvish

Contributor
Posts: 23

Re: One question about GROUPFORMAT option in MERGE...BY...

Posted in reply to UrvishShah

Hi Urvish,

I am using SAS 9.1.3 and I didn't get any ERROR message on this example.

Thanks!

Ask a Question
Discussion stats
  • 14 replies
  • 615 views
  • 0 likes
  • 3 in conversation