Help using Base SAS procedures

Lag and data+set

Reply
N/A
Posts: 0

Lag and data+set

Hi all.

I want to fill a fields with the sum of the content of the same field in previous registers if several conditions are satisfied

For example, I need to know the number of days that a client has connected to my aplication.

The initial table is:

ID_CLIENT DATE CONNECTED
1 10-10-2007 Yes
1 10-11-2007 No
1 10-12-2007 Yes
2 08-10-2007 Yes
2 15-11-2007 Yes
2 20-12-2007 Yes
3 25-10-2007 Yes
3 30-10-2007 No

And the final table should be:

ID_CLIENT DATE CONNECTED NUMBER
1 10-10-2007 Yes 1
1 10-11-2007 No 1
1 10-12-2007 Yes 2
2 08-10-2007 Yes 1
2 15-11-2007 Yes 2
2 20-12-2007 Yes 3
3 25-10-2007 Yes 1
3 30-10-2007 No 1


The client 1 connects to the application the days 10-10-2007 and 10-12-2007. In the first row (date 10-10-2007) the field NUMBER must be 1, because he has connected this day. In the second row the field NUMBER is still 1 (he doesn't connected on 10-11-2007, and the value is the same that in the previous row). In the third row, the field NUMBER is 2 (he has connected)

I tried to do it with lag sentences into a Data+Set+By proccess, but I'm new on SAS and I can't do it.

I hope you to understand it!!!

Thanks!!!
Frequent Contributor
Posts: 139

Re: Lag and data+set

Hello

First of all, SAS is a fantastic data processing tool and once you understand some techniques and syntax you can do just about anthing.

There are many ways to do this in SAS. Unfortunately for the new SAS users all the methods will be challenging. The method below is something SAS calls "by group processing" and in my opinion this one of the most powerful data processing functions of SAS. It woul behoove you to throughly understand it. The SAS documentation does a good on this. Below are the link and navigation. The documentation can be difficult to navigate, but you'll get used to it after awhile.

Go to http://support.sas.com/onlinedoc/913/docMainpage.jsp
On the menu go to Base SAS ->
SAS Language Reference: Concepts ->
DATA Step Concepts ->
BY-Group Processing in the DATA Step

Now for the Example

* Create a SAS Data Set;
data one;
length id_client date 8 connected $ 3;
input id_client date connected $;
Format date ddmmyy10.;
informat date ddmmyy10.;
datalines;
1 10-10-2007 Yes
1 10-11-2007 No
1 10-12-2007 Yes
2 08-10-2007 Yes
2 15-11-2007 Yes
2 20-12-2007 Yes
3 25-10-2007 Yes
3 30-10-2007 No
;
run;

* Sort by client and date;
proc sort data=one;
by id_client date;
run;

* This is an advanced technique for a SAS newbie, but it works;
data two;
set one;
by id_client; /* enables by group processing in which
SAS creates 2 boolean internal variables: 1) first.id_client = 1 for the first occurence of id_client, else=0
2) last.id_client= 1 for the last occurence of id_client, else=0
*/
retain number ; * the retain statements "retains" the previous value unless you specifically tell it not too;

if first.id_client=1 then number=0; /* for the occurence of id_client your counter is set to 0 */
if connected eq 'Yes' then number=number+1; /* once you see a Yes the counter is incremented by 1 */
run;

/* to see what the first. and last. automatic variables are doing, see this code */
/* for learning purpose of by group processing */
data WhatSASisDoing;
set one;
by id_client;
retain number;
first=first.id_client; /* SAS won't output internal/automatice variables unless you assign it to another variable */
last=last.id_client;

if first.id_client=1 then number=0; /* for the occurence of id_client your counter is set to 0 */
if connected eq 'Yes' then number=number+1; /* once you see a Yes the counter is incremented by 1 */

run;


-Darryl
N/A
Posts: 0

Re: Lag and data+set

Darryl provides the best solution.

Most programmers who are new to SAS generally have difficulty at first. This is because SAS is not like the other more traditional languages: C, Pascal, COBOL, etc.. Even the object oriented languages, like Java, are more conventional and process and control oriented than SAS.

SAS is a data centric language created to simplify programming for data analysts.
It assumes many things:
1) you have a set of data that is a collection of records or observations (observation = record from an experiment).
2) each observation has a set of variables = fields = attributes.
3) you want to read through the list of observations and do stuff to analyze the data.

Thus
[pre]
data outdata;
set indata;
...
run;
[/pre]
creates a data set called outdata from the transformations applied to the data set called indata. SAS "opens" outdata for write and indata for read, and loops through indata, reading each observation and then applies the "..." transformations. In other languages, you would have to define a variable to hold the file descriptor for outdata, and another one for indata. You would have to define a structure (struct) for how to read the data in the proper organization. Then you "open" the files, read the first observation, code a "do while" loop, code the transformations, code a second read within the loop, and finally code to close the files. In C it would look like:
[pre]
FILE indata, outdata;
struct in_buffer { ... } ;
struct out_buffer { ... } ;

indata = fopen ... ;
outdata = fopen ... ;
while (fread(indata))
{
....
}
fclose(indata);
fclose(outdata);
[/pre]
It has been years since I've written C, so I probably made some syntax errors, but the idea should be evident. SAS coding is a lot simpler and faster, in most cases.

Because it is a common thing to want to look at the data in groups, and to be able to identify the first occurrance of a group of observations, and the last, there is the "first.[variable]" and "last.[variable]" language elements.

Each time SAS loops through the DATA step, it resets all non-retained variables to "missing" (null values) first, because it is a common thing to "initialize" variables before reading in data and doing things with it. So, if you want to keep a value, like a sum or a count, from one observation to the next, you need to "RETAIN" its value.

Since you seem to be new to SAS, >>>>> Welcome <<<<<< !
Please find the SAS documentation for BaseSAS under support for your version and begin reading, especially SAS concepts.

SAS is an incredibly rich language/system.
For me, it is "The greatest data processing language on the planet".

Have fun.
Ask a Question
Discussion stats
  • 2 replies
  • 114 views
  • 0 likes
  • 2 in conversation