Using Proc Format for Merging Data

Reply
N/A
Posts: 0

Using Proc Format for Merging Data

Hi
I have known that it's possible to use proc format for merging data.
I am trying to manage it at the moment, but can't understand it completely.
Please, help me if somebody is familiar with that.

It's too much text to explain what merging I need, so I just write sql query, that I need to perform using data step + proc format:

create table testtable as (select K.n, sum(P.b), count(P.a)
from P right join K on K.a = P.a
where K.n in (11, 12, 17, 21)
group by K.n);

all the fields are numeric


Message was edited by: Hippo
N/A
Posts: 0

Re: Using Proc Format for Merging Data

I'll try digesting into base sas code with summary
Assuming A is numeric ;
*we want total of B for each A in P, before looking at columns in K ;
* create look-up from A to sum-of-B-within-A;
*1 create sum-of-B-within-A ;[pre]proc summary data= P missing ;
class A ;
var B ;
output sum= out= sumB ;
run;[/pre]
*2 adapt to layout for proc format cntlin= ;[pre]data cntl ;
retain fmtname 'statB' ;
set sumB( rename=( A=start B=label ) ) ;
keep fmtname start label ;
run;[/pre]
*3 build format ;[pre]proc format cntlin= cntl ;
run;[/pre]
*4 use the format ;
* assume data set K is in order of variable N ;[pre]data joined ;
aSum = 0 ;
aN = 0 ;
do until( last.N ); * loop over the N group;
set K ;
by N ;
asum1 = input( put( A, statB.), best32. ) ;
* input() needed because a format creates only a string value;
aSum = aSum + asum1 ; * totalling the B (subtotals) for A ;
aN +1 ;
end;
drop asum1 ;
run;[/pre]
This gives not only the total of B for each N, it provides also the count of unique values of A in each N in the K data set.

Was that what was wanted ?
If not please explain more.

Rather than MERGE I would refer to this as a Random Order Lookup,
because the A order is unimportant even though A is the join key
between data sets P and K.


PeterC
SAS Super FREQ
Posts: 8,721

Re: Using Proc Format for Merging Data

Hi:
PROC FORMAT is a way to link a display value with a data value. So, for example, if the data has the codes 1 and 2 for Gender, you might want them to display as "M" and "F" in any report you create. A user-defined format will be the mechanism for how that happens. A PROC FORMAT list is a lookup list. So, from that standpoint (as a lookup list), a user-defined format can be used to select records. Sometimes this table lookup method -- with formats -- is very efficient; sometimes it is more efficient to use an SQL join or a merge. The papers below all offer some information on using PROC FORMAT and talk about TABLE lookups.

http://www2.sas.com/proceedings/sugi26/p158-26.pdf
http://www2.sas.com/proceedings/sugi26/p062-26.pdf
http://www2.sas.com/proceedings/sugi31/249-31.pdf
http://www2.sas.com/proceedings/sugi31/249-31.pdf
http://www2.sas.com/proceedings/forum2007/116-2007.pdf

One possible way to run a SAS procedure, is with a WHERE statement, such as this:
[pre]
proc means data=mybigfile n sum mean;
var height;
class age;
where gender = 1;
run;
[/pre]

So this PROC MEANS will run ONLY on the subset of data which satisfies the WHERE criteria. The PROC MEANS could be modified to use a more complex WHERE clause, but all the variables being tested must exist in the data set being sent to the procedure. So, if the MYBIGFILE data set contains a variable called "ID" then I could use ID in a WHERE statement. If MYBIGFILE did NOT have a variable called ID, then I could not use ID in a WHERE statement. In other words, if I wanted to do this:
[pre]
where GENDER = 1 and ID in (1, 2, 3, 4);
[/pre]

Then both GENDER and ID need to be in the data set being passed to PROC MEANS, or any procedure.

SAS has many procedures that are used for counting and summing records. You have posted this same SQL join question in several forums. It seems like you're searching for a solution -- either a hash table approach, a PROC TABULATE approach, a table lookup approach, or some approach that will approximate what you can express in the SQL join.

These forum postings on TABLE lookup may help you:
http://support.sas.com/forums/thread.jspa?messageID=3292೜
http://support.sas.com/forums/thread.jspa?messageID=8346ₚ
http://support.sas.com/forums/thread.jspa?threadID=2097&tstart=0

Your best bet for help with this question may be to contact Tech Support, they can help you figure out the best, most efficient approach, given the size of your input files, whether they're stored in SAS tables or in another database format and your particular processing needs. If all you need is a simple sum and count report, then they will probably be able to help you quicker than if you post the same query to different forums.

To send a question to Tech Support, go to http://support.sas.com/ and in the left-hand navigation pane, click on the link entitled "Submit a Problem".

cynthia
N/A
Posts: 0

Re: Using Proc Format for Merging Data

to Peter_C: Thank you very much! Quite detailed explanation. I am investigating each line you wrote, but still have a questions.
1) What is aSum1 in your example? It appears just from the sky and I cannot understand this.
2) What is aN?
3) Where should I filter records ( where K.n in (11, 12, 17, 21) )?

to Cynthia: Thank you for the literature. Reading...
N/A
Posts: 0

Re: Using Proc Format for Merging Data

1[pre]asum1 = input( put( A, statB.), best32. ) ; [/pre]in this statement [pre] put( A, statB.) [/pre] does the lookup for key=A to get the total of B created by the proc summary through the user format statB.

2
aN is a counter of the rows in K which share the same value of N

3
aaaaaaaah, sorry.
Forgot to add the where clause. Replace the statement [pre]set K ;[/pre] with [pre]set K( where=( N in (11, 12, 17, 21) )) ; [/pre] That should do it.
N/A
Posts: 0

Re: Using Proc Format for Merging Data

sorry, I just missed "1" in asum1

Now it says that BY variables are not properly sorted on data set K.
N/A
Posts: 0

Re: Using Proc Format for Merging Data

did you notice

*4 use the format ;
* assume data set K is in order of variable N ;

as it is not in order..... [pre] proc sort data= k ;
by N ;
run; [/pre]
Normally, I would output sorted data to a different data set and use a keep= list to reduce the sorting to the minimum. If there is no other use for the data in the N order, then the sort should be improved by using the where= clause. This requires a separate dataset coming out of the sort. so[pre]
proc sort data= k( where=( N in (11, 12, 17, 21) ) )
out= kS ;
by N ;
run; [/pre] now it is a different data set that is read in the stage 4.[pre]data joined ;
aSum = 0 ;
aN = 0 ;
do until( last.N ); * loop over the N group;
set kS ;
by N ;
asum1 = input( put( A, statB.), best32. ) ;
* input() needed because a format creates only a string value;
aSum = aSum + asum1 ; * totalling the B (subtotals) for A ;
aN +1 ;
end;
drop asum1 ;
run;
N/A
Posts: 0

Re: Using Proc Format for Merging Data

oh yeah... I remember that this is grouping and we should sort before.
The sum is ok, but count of 'a' for each 'n' is not correct Smiley Sad
Also, if there are no matches of 'a' in tables, the sum is something like 9.0767306E14. Can it be 0?
N/A
Posts: 0

Re: Using Proc Format for Merging Data

I'll leave that to you
N/A
Posts: 0

Re: Using Proc Format for Merging Data

Ok. Thank you very much. I hope this post will be useful for other beginner SAS developers.
N/A
Posts: 0

Re: Using Proc Format for Merging Data

Hello Peter

I understood why the count was wrong.
I created one more format, where _FREQ_=label, and increase it in Joined data set in the same way as sum, but I still cannot understand which format should I use to display zeros instead 9.6767E14.
N/A
Posts: 0

Re: Using Proc Format for Merging Data

good progress

if there is no value to be found by a format, it either returns the original input (value of A), or a special value that you can program into the user format, called the "OTHER" range.
That is controlled in a cntlin= dataset with the HLO='O' variable.
To see these in action, create an cltlout= dataset and look at its value for a format like[pre]proc format cntlout= cntlo ;
title 'demo of user format cntlout= data set' ;
value demo 0 ='zero' 1='one' other='more' ;
run;
proc print data= cntlo ;
run;[/pre]
Then you need to find a way to set correct values for HLO when you build your cntlin dataset.
The procedures manual and /or on line doc and help is worth researching at PROC FORMAT

PeterC
Ask a Question
Discussion stats
  • 11 replies
  • 726 views
  • 0 likes
  • 2 in conversation