SAS Data Integration Studio, DataFlux Data Management Studio, SAS/ACCESS, SAS Data Loader for Hadoop and others

Point= option & By statement

Accepted Solution Solved
Reply
Super Contributor
Posts: 340
Accepted Solution

Point= option & By statement

Hello,

I would like to get the following observations of a data set by groups. The - working (?) - solution would be this:

Data Have;
  Input N $ X @@;
  Datalines ;
  A 1 A 3 A 6 B 2 B 9 B 9 C 1 C 5 C 6 C 7 C 8
  ;
Run;

Data Want;
  Obs=_N_+1;
  Set Have End=Eof;
  By N; /* Can I do this? */
  If not EoF Then Set Have (Keep=X Rename=(X=X_Next)) Point=Obs;
  Else Call Missing (X_Next);
  If Last.N Then Call Missing (X_Next);
Run;

However, in the documentation it says (SAS(R) 9.2 Language Reference: Dictionary, Fourth Edition):

Restriction:You cannot use POINT= with a BY statement, a WHERE statement, or a WHERE= data set option. In addition, you cannot use it with transport format data sets, data sets in sequential format on tape or disk, and SAS/ACCESS views or the SQL procedure views that read data from external files.

The solution is required to be reliable. My question is: Is this combination of point and by o.k.?

Thanks&kind regards


Accepted Solutions
Solution
‎09-18-2014 05:08 AM
Trusted Advisor
Posts: 3,212

Re: Point= option & By statement

Posted in reply to user24feb

Yes your coding is a valid approach. It is thinking about:

-  what happens with the PDV

- automatic advancing of records

- at what level the internal housekeeping is done

Your by step is working with the first set statement, there is no point defined.

You second set statement is having different options it acts sepeartely. You could use a table lookup (example 8 Set statement) SAS(R) 9.4 Statements: Reference, Third Edition

Using an index is an binary search on some values and than jumping to the complete record using a point / direct access. Using the point is thinking in an own indexed way.

A very nice paper is: http://support.sas.com/resources/papers/proceedings09/038-2009.pdf  The double dow loop is using the different internal householding to restart calculations within an ordered group.

Why would you use these kind of constructs?

They can be much faster in execution time and resource usage common known constructs including SQL. It the reason of the hyping of NOSQL.

The disadvantage is that you have analyze your problem very well to understand what should be done and than the coding part takes more effort and verifying.   

---->-- ja karman --<-----

View solution in original post


All Replies
Super Contributor
Posts: 308

Re: Point= option & By statement

Posted in reply to user24feb

Hello,

Why not to use a merge ?

data want;
merge have have(firstobs=2 rename=(x=x_next n=n_next));

If n ne n_next then Call Missing (X_Next);
drop n_next;
run;

SAS Employee
Posts: 340

Re: Point= option & By statement

Posted in reply to user24feb

Your datastep is OK.

There are other methods to implement look ahead in a data step, but I like your version as well. Smiley Happy

see:

Look-Ahead and Look-Back - sasCommunity

Solution
‎09-18-2014 05:08 AM
Trusted Advisor
Posts: 3,212

Re: Point= option & By statement

Posted in reply to user24feb

Yes your coding is a valid approach. It is thinking about:

-  what happens with the PDV

- automatic advancing of records

- at what level the internal housekeeping is done

Your by step is working with the first set statement, there is no point defined.

You second set statement is having different options it acts sepeartely. You could use a table lookup (example 8 Set statement) SAS(R) 9.4 Statements: Reference, Third Edition

Using an index is an binary search on some values and than jumping to the complete record using a point / direct access. Using the point is thinking in an own indexed way.

A very nice paper is: http://support.sas.com/resources/papers/proceedings09/038-2009.pdf  The double dow loop is using the different internal householding to restart calculations within an ordered group.

Why would you use these kind of constructs?

They can be much faster in execution time and resource usage common known constructs including SQL. It the reason of the hyping of NOSQL.

The disadvantage is that you have analyze your problem very well to understand what should be done and than the coding part takes more effort and verifying.   

---->-- ja karman --<-----
Super User
Posts: 10,020

Re: Point= option & By statement

Posted in reply to user24feb

Here is two solution . one is using routine CALL SET () .  another is making a flag variable.

data want;
 set have;
 by n;
 next=.; 
retain dsid;
 if _n_ eq 1 then dsid=open('have(keep=x rename=(x=next))','i');
  call set(dsid);
 if not last.n then rc=fetchobs(dsid,_n_+1);
  run;


Data Have;
  Input N $ X @@;
  Datalines ;
  A 1 A 3 A 6 B 2 B 9 B 9 C 1 C 5 C 6 C 7 C 8
  ;
Run;


proc sort data=have;by n;run;
data want;
 set have nobs=nobs end=last;
 if _n_ ne nobs then do;
  obs=_n_+1;
  set have(Rename=(n=_n X=X_Next)) point=obs ;
 end;
  If n ne _n or last Then Call Missing (X_Next);     
run;

Xia Keshan

🔒 This topic is solved and locked.

Need further help from the community? Please ask a new question.

Discussion stats
  • 4 replies
  • 363 views
  • 6 likes
  • 5 in conversation