DATA Step, Macro, Functions and more

How to revisit observations in DS2

Reply
Super Contributor
Posts: 298

How to revisit observations in DS2

Hello everyone,

I wonder whether it is possible to revisit observations with DS2. The condition is that I don't know beforehand how many times I will revisit an observation. This can be the case if for instance I use a newton raphson procedure to maximize a function, and I dont know how many iterations I need to get convergence. For each itereation I need to go through all observations. The reason why I want to use DS2 is to get access to the matrix package.

In the ordinary datastep I know of two ways to get random access to observations: The open() function and the point= option. Simple examples of their use:

data test2; do obs=1 to 3;output;end;run;

data _NULL_;

  dsid=open('test2');

  call set(dsid);

  do i=1 to 2;

    do j=1 to attrn(dsid,'nobs');

    obsid=fetchobs(dsid,j);

put obs=;

end;

  end;

run;

data _NULL_;

  do i=1 to 2;

  do j=1 to 3 ;

    set test2 point=j ;

put obs= ;

  end;

  end;

  stop;

run;

Unfortunately, none of these two methods are possible with DS2.

Trusted Advisor
Posts: 1,301

Re: How to revisit observations in DS2

Posted in reply to JacobSimonsen

In DS2 you could get set key= functionality using the sqlstmt package

Respected Advisor
Posts: 4,173

Re: How to revisit observations in DS2

I don't have a lot of experience with DS2 - but isn't SQLSTMT only used in a Federation Server context?

Respected Advisor
Posts: 4,173

Re: How to revisit observations in DS2

Posted in reply to JacobSimonsen

You could use the hash or hash iterator package. That should allow you to iterate through the data as many times as you need.

Trusted Advisor
Posts: 1,301

Re: How to revisit observations in DS2

You are confusing some not necessarily related things.  The SAS Federation Server product and SAS FedSQL language.  Which, given their names and such, I don't blame you for.

FedSQL is a implementation, by SAS, of ANSI SQL:1999, and attempts to be a 'vendor neutral SQL' that is compliant with most any DBMS that has a SQL interpreter.  FedSQL in SAS is used as the SQL language in PROC FEDSQL and PROC DS2.  These procedures and the FedSQL language are in no way dependent on the Federation Server product.

Respected Advisor
Posts: 4,173

Re: How to revisit observations in DS2

Thanks. Good to know. I have definitely some more reading to do.

I've seen the value of DS2 for CI implementations using the Federation Server and I start to understand some of the other areas where it could add value (proc hpds2). I haven't found a real life application in my own line of work yet. But that's may be only because I don't fully understand. Still hoping for some exciting white papers showing how it's actually used in practice (I know that some solutions use it "in the background").

Trusted Advisor
Posts: 1,301

Re: How to revisit observations in DS2

One of the most common tasks I utilize PROC DS2 for is model scoring.  Have seen significant benefits from the threading for these high compute tasks.  Semantically, I also really like DS2.  While it may take a few more lines of code to duplicate simple data steps, in many situations I find myself being able to overall reduce complexities, especially with highly recursive routines. 

Some additional stream of consciousness thought:

From what I gathered at SGF2014, a lot of focus in DS2 is being placed on integration with Hadoop. 

I am very excited about the recent release of the HTTP package and hope the ability to have non fixed and traverse-able data types will come soon so that I can start dynamically ingesting/creating and JSON and XML data to interact with WebService api's more simply.

Respected Advisor
Posts: 4,927

Re: How to revisit observations in DS2

Hi Fried,

what is a non fixed and traverse-able data type?

PG

PG
Trusted Advisor
Posts: 1,301

Re: How to revisit observations in DS2

A better word may be, non-sized instead of fixed.  Is a hierarchical/self-defining data structure, such as JSON/XML that could be received in DS2 without necessary prior knowledge of it's size or components and then traversed as an iterable object to get the key/data components.  Similar to a hash object, in essence, where instead of putting a dataset into the object, I instead put a JSON/XML document into it...

Super Contributor
Posts: 298

Re: How to revisit observations in DS2

I couldn't get the hash-object solution to work as intended. It is not a problem to define the hash-object and also easy enough to read data into it. But how do I let each thread read different parts of the hash-object. If I use an argument to the thread, then it is the same argument to all threads,

The goal is that the threads should work on different parts of the data - in a way so data can be revisited.

Trusted Advisor
Posts: 3,215

Re: How to revisit observations in DS2

Posted in reply to JacobSimonsen

Jacob, that makes sense that the hash object is not shared between all different threads. It would only work in a  SAS dataset structure as the other threads in a RDBMS are out of reachable area for ds2.

---->-- ja karman --<-----
SAS Employee
Posts: 340

Re: How to revisit observations in DS2

Posted in reply to JacobSimonsen

Maybe the only way to do it, is to load the hash object with the hash.add() method inside the thread (Do not use "datasource" when you declare the hash object! ):

thread my_thread;

method run();

set my_input;

my_hash.add();

...

end;

endthread;

run;

This way all your hash objects in different threads will be different.

And probably you will use the by statement as well. I think all this makes sense only if you do by group processing.

At the beginning of a group (first.group_var=1) you clear your hash object ( my_hash.clear() ).

At every read you just store the data in the hash object (my_hash.add()).

At the end of a group (last.group_var=1) you start to analyze data in the hash object.

Trusted Advisor
Posts: 3,215

Re: How to revisit observations in DS2

Posted in reply to gergely_batho

Gergely, The question would be can al threads share the same memory set up by one (hash is in memory).

This is requiring a sychronisation process with locking as in a RDBMS and a shared entrypoint for shared memory. I do not expect it is there. It are system programming techniques close to kernel level. The multi-users address spaces where dependent of those.

Of course you can define a hash in every thread, but it will fail in the benefits of hash usage (collecting all random events) while pushing pressure on the memory resource.

---->-- ja karman --<-----
SAS Employee
Posts: 340

Re: How to revisit observations in DS2

Hi Jaap,

I agree with you. DS2 theards do not share memory.

But my reply was just a short answer/idea on this topic: "The goal is that the threads should work on different parts of the data".

Super Contributor
Posts: 298

Re: How to revisit observations in DS2

I have succeeded now to write a program where I can revisit the observations. I have used the hash object which I define in each single thread as discussed above.

Unfortunately the method is not time-efficient, but it was a fun experiment anywaySmiley Happy. I think the problem is that all threads use the same memory-resource (as Jaap points out).  I observe that all my four cpu's are in use during the process, but in total they use only about 25% of total cpu. The log shows that real time usage is only slightly smaller than cpu-time usage.

An other problem is that it is not a very stable method. Most often it run without problems but crashes are not rare.

I have enclosed the program for those who are interested. It is a simple estimation of parameters in a logistic regression where I have a by-group variable.

(only made for test purpose, my real problem is far more complicated than logistic regression)

Attachment
Ask a Question
Discussion stats
  • 19 replies
  • 596 views
  • 8 likes
  • 6 in conversation