About Manu_Jain

mkeintz · ‎09-20-2023

SQL does not care about incoming data order. This might be regarded as an advantage in many contexts (although it can generate a performance price). But unlike the MERGE (or other data step statements like SET), it provides no reliable (not to mention efficient) way to look ahead (or look back) from the observation in hand. The advantages of MERGE are intrinsically the advantages of sequential processing. Say you have a dataset sorted by ID/DATE and you want, for each record, the number of days between the current date and the preceding and successive dates in a dataset, within each ID. Then a simple self-merge (with the firstobs=2) dataset name parameter for one of the merged data streams, can suffice. Consider this sample of 500,000 ID's with about 5.8m observations, sorted by ID/DATE. The DATA WANT step uses MERGE (with a firstobs=2 parameter) to get the lookahead date (NEXT_DATE) and corresponding DAYS_UNTIL_NEXT_DATE, and LAG function to get PRIOR_DATE, and DAYS_SINCE_PRIOR_DATE. It takes about 0.81 seconds to run on my machine. Neither the lookahead technique or the LAG function are available in PROC SQL. data have; do id=1 to 500000; do date='01jan1990'd by 0 until (date>='01jan2020'd); output; date=date + ceil(1990*ranuni(0159855)); end; end; format date date9. ; run; data want (drop=next_id); merge have have (firstobs=2 rename=(id=next_id date=next_date)); if next_id^=id then call missing(next_date); if next_date^=. then days_to_next_date=next_date-date; prior_date=ifn(lag(id)=id,lag(date),.); if prior_date^=. then days_since_prior_date =date-prior_date; format prior_date date9. ; run; The analogous PROC SQL below would look (to me) relatively ugly. And it takes 16.73 seconds - about 20 times as long: proc sql; create table want_SQL as select L.*,R.next_date ,case when R.next_date^=. then R.next_date-date else . end as days_to_next_date ,R2.prior_date , case when R2.prior_date^=. then date-R2.prior_date else . end as days_since_prior_date from (select L.*,R.next_date from have as L left join have (rename=(date=next_date)) as R on R.id=L.id and next_date>date group by L.id,date having next_date=min(next_date) ) left join have (rename=(date=prior_date)) as R2 on L.id=R2.id and R2.prior_date<date group by L.id,date having prior_date=max(prior_date) ; quit; But the above is done with a minimal memory burden for PROC SQL - there are only 2 variables in dataset HAVE. Adding other memory-consuming vars will change the performance ratio considerably. In fact, when I added 6 character variables (each with length $200), the MERGE approach took 15 seconds, but the PROC SQL blew up with insufficient memory after 8 minutes.

Paul_NYS · ‎11-29-2012

The below transpose may work with some modifications, thank you. While it did not come out the way I was envisioning, I'll report back on the result. Paul proc transpose data=have out=want ; by cnty_name startyear agecat4 exitMonth cohortyeartotal cohortyeartotalage ; id exit; var OutOfHomeCare; run;

Online Status	Offline
Date Last Visited	‎09-01-2015 07:11 AM

Re: Proc export happens correctly but a weird dialogue box pop ups?

Re: Proc export happens correctly but a weird dialogue box pop ups?

Re: Proc export happens correctly but a weird dialogue box pop ups?

Re: difference between merg and proc sql join.

Re: Proc Transpose

Re: Proc Transpose

Re: Proc export happens correctly but a weird dialogue box pop ups?

Re: One-to-one merge and then force out

Re: date Format issue

Re: difference between merg and proc sql join.

Re: Proc Transpose