BookmarkSubscribeRSS Feed
deleted_user
Not applicable
Hi
I have known that it's possible to use proc format for merging data.
I am trying to manage it at the moment, but can't understand it completely.
Please, help me if somebody is familiar with that.

It's too much text to explain what merging I need, so I just write sql query, that I need to perform using data step + proc format:

create table testtable as (select K.n, sum(P.b), count(P.a)
from P right join K on K.a = P.a
where K.n in (11, 12, 17, 21)
group by K.n);

all the fields are numeric


Message was edited by: Hippo
11 REPLIES 11
deleted_user
Not applicable
I'll try digesting into base sas code with summary
Assuming A is numeric ;
*we want total of B for each A in P, before looking at columns in K ;
* create look-up from A to sum-of-B-within-A;
*1 create sum-of-B-within-A ;[pre]proc summary data= P missing ;
class A ;
var B ;
output sum= out= sumB ;
run;[/pre]
*2 adapt to layout for proc format cntlin= ;[pre]data cntl ;
retain fmtname 'statB' ;
set sumB( rename=( A=start B=label ) ) ;
keep fmtname start label ;
run;[/pre]
*3 build format ;[pre]proc format cntlin= cntl ;
run;[/pre]
*4 use the format ;
* assume data set K is in order of variable N ;[pre]data joined ;
aSum = 0 ;
aN = 0 ;
do until( last.N ); * loop over the N group;
set K ;
by N ;
asum1 = input( put( A, statB.), best32. ) ;
* input() needed because a format creates only a string value;
aSum = aSum + asum1 ; * totalling the B (subtotals) for A ;
aN +1 ;
end;
drop asum1 ;
run;[/pre]
This gives not only the total of B for each N, it provides also the count of unique values of A in each N in the K data set.

Was that what was wanted ?
If not please explain more.

Rather than MERGE I would refer to this as a Random Order Lookup,
because the A order is unimportant even though A is the join key
between data sets P and K.


PeterC
Cynthia_sas
SAS Super FREQ
Hi:
PROC FORMAT is a way to link a display value with a data value. So, for example, if the data has the codes 1 and 2 for Gender, you might want them to display as "M" and "F" in any report you create. A user-defined format will be the mechanism for how that happens. A PROC FORMAT list is a lookup list. So, from that standpoint (as a lookup list), a user-defined format can be used to select records. Sometimes this table lookup method -- with formats -- is very efficient; sometimes it is more efficient to use an SQL join or a merge. The papers below all offer some information on using PROC FORMAT and talk about TABLE lookups.

http://www2.sas.com/proceedings/sugi26/p158-26.pdf
http://www2.sas.com/proceedings/sugi26/p062-26.pdf
http://www2.sas.com/proceedings/sugi31/249-31.pdf
http://www2.sas.com/proceedings/sugi31/249-31.pdf
http://www2.sas.com/proceedings/forum2007/116-2007.pdf

One possible way to run a SAS procedure, is with a WHERE statement, such as this:
[pre]
proc means data=mybigfile n sum mean;
var height;
class age;
where gender = 1;
run;
[/pre]

So this PROC MEANS will run ONLY on the subset of data which satisfies the WHERE criteria. The PROC MEANS could be modified to use a more complex WHERE clause, but all the variables being tested must exist in the data set being sent to the procedure. So, if the MYBIGFILE data set contains a variable called "ID" then I could use ID in a WHERE statement. If MYBIGFILE did NOT have a variable called ID, then I could not use ID in a WHERE statement. In other words, if I wanted to do this:
[pre]
where GENDER = 1 and ID in (1, 2, 3, 4);
[/pre]

Then both GENDER and ID need to be in the data set being passed to PROC MEANS, or any procedure.

SAS has many procedures that are used for counting and summing records. You have posted this same SQL join question in several forums. It seems like you're searching for a solution -- either a hash table approach, a PROC TABULATE approach, a table lookup approach, or some approach that will approximate what you can express in the SQL join.

These forum postings on TABLE lookup may help you:
http://support.sas.com/forums/thread.jspa?messageID=3292೜
http://support.sas.com/forums/thread.jspa?messageID=8346ₚ
http://support.sas.com/forums/thread.jspa?threadID=2097&tstart=0

Your best bet for help with this question may be to contact Tech Support, they can help you figure out the best, most efficient approach, given the size of your input files, whether they're stored in SAS tables or in another database format and your particular processing needs. If all you need is a simple sum and count report, then they will probably be able to help you quicker than if you post the same query to different forums.

To send a question to Tech Support, go to http://support.sas.com/ and in the left-hand navigation pane, click on the link entitled "Submit a Problem".

cynthia
deleted_user
Not applicable
to Peter_C: Thank you very much! Quite detailed explanation. I am investigating each line you wrote, but still have a questions.
1) What is aSum1 in your example? It appears just from the sky and I cannot understand this.
2) What is aN?
3) Where should I filter records ( where K.n in (11, 12, 17, 21) )?

to Cynthia: Thank you for the literature. Reading...
deleted_user
Not applicable
1[pre]asum1 = input( put( A, statB.), best32. ) ; [/pre]in this statement [pre] put( A, statB.) [/pre] does the lookup for key=A to get the total of B created by the proc summary through the user format statB.

2
aN is a counter of the rows in K which share the same value of N

3
aaaaaaaah, sorry.
Forgot to add the where clause. Replace the statement [pre]set K ;[/pre] with [pre]set K( where=( N in (11, 12, 17, 21) )) ; [/pre] That should do it.
deleted_user
Not applicable
sorry, I just missed "1" in asum1

Now it says that BY variables are not properly sorted on data set K.
deleted_user
Not applicable
did you notice

*4 use the format ;
* assume data set K is in order of variable N ;

as it is not in order..... [pre] proc sort data= k ;
by N ;
run; [/pre]
Normally, I would output sorted data to a different data set and use a keep= list to reduce the sorting to the minimum. If there is no other use for the data in the N order, then the sort should be improved by using the where= clause. This requires a separate dataset coming out of the sort. so[pre]
proc sort data= k( where=( N in (11, 12, 17, 21) ) )
out= kS ;
by N ;
run; [/pre] now it is a different data set that is read in the stage 4.[pre]data joined ;
aSum = 0 ;
aN = 0 ;
do until( last.N ); * loop over the N group;
set kS ;
by N ;
asum1 = input( put( A, statB.), best32. ) ;
* input() needed because a format creates only a string value;
aSum = aSum + asum1 ; * totalling the B (subtotals) for A ;
aN +1 ;
end;
drop asum1 ;
run;
deleted_user
Not applicable
oh yeah... I remember that this is grouping and we should sort before.
The sum is ok, but count of 'a' for each 'n' is not correct 😞
Also, if there are no matches of 'a' in tables, the sum is something like 9.0767306E14. Can it be 0?
deleted_user
Not applicable
I'll leave that to you
deleted_user
Not applicable
Ok. Thank you very much. I hope this post will be useful for other beginner SAS developers.
deleted_user
Not applicable
Hello Peter

I understood why the count was wrong.
I created one more format, where _FREQ_=label, and increase it in Joined data set in the same way as sum, but I still cannot understand which format should I use to display zeros instead 9.6767E14.
deleted_user
Not applicable
good progress

if there is no value to be found by a format, it either returns the original input (value of A), or a special value that you can program into the user format, called the "OTHER" range.
That is controlled in a cntlin= dataset with the HLO='O' variable.
To see these in action, create an cltlout= dataset and look at its value for a format like[pre]proc format cntlout= cntlo ;
title 'demo of user format cntlout= data set' ;
value demo 0 ='zero' 1='one' other='more' ;
run;
proc print data= cntlo ;
run;[/pre]
Then you need to find a way to set correct values for HLO when you build your cntlin dataset.
The procedures manual and /or on line doc and help is worth researching at PROC FORMAT

PeterC

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

New Learning Events in April

 

Join us for two new fee-based courses: Administrative Healthcare Data and SAS via Live Web Monday-Thursday, April 24-27 from 1:00 to 4:30 PM ET each day. And Administrative Healthcare Data and SAS: Hands-On Programming Workshop via Live Web on Friday, April 28 from 9:00 AM to 5:00 PM ET.

LEARN MORE

Discussion stats
  • 11 replies
  • 1674 views
  • 0 likes
  • 2 in conversation