BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
ebeth
Fluorite | Level 6

Hello, I'm a 2 month old SAS user and just started practicing COUNTER, RETAIN, FIRST. ,Last. and DO/END. I can  understand simple examples (of course) but am getting stuck when there is missing data and multiple variables involved. 

 

For example:

This data is sorted BY ID DATE TYPE already and the columns in red are what I'm trying to add in.

I think Num_A should be counter+1 when the variable is the first non=missing value for A and then start over for each date. And eventually if the ID changed as well. The same for num_B and C. 

 

IDDATETYPEABCNUM_ANUM_BNUM_C  
11/1/2010B.10.010
11/1/2010A3..100
11/1/2010A7..200
11/1/2010C..11001
13/17/2010B.5.010
13/17/2010A4..100
13/17/2010C..3001
15/1/2010B.7.010
15/1/2010A3..100
15/1/2010C..400

1

 

1 ACCEPTED SOLUTION

Accepted Solutions
Patrick
Opal | Level 21

@ebeth wrote:

This worked almost perfectly. Thank you. 

One odd thing though is that the very first NUM_A is a missing value. The only missing value.  Instead of zero. I thought the RETAIN statement set that to zero?


Good catch. The RETAIN statement just retains values instead of setting them to missing before reading in a new line of data.

What's happening here: The first value of B is not missing so here the logic for the value in the first observation leads to the else case: NUM_B=NUM_B+1;

Because here variable NUM_B is still missing, the formula sums the values <missing>+1. This results is a missing.

To avoid such a case: Either set all the values to 0 whenever there is a new id (if first.id) or even simpler use the sum() function as done below. The sum() function ignores missings and though the result of sum(<missing>,1) is 1

 

Below code demonstrates 3 different coding options all returning the same result.

Call missing() allows to set an array of variables to missing. There is unfortunately still no call routine which would allow to set variables to another value; except for a not very often used "hack" using call stdize() 

/* create desired result */
data want;
  set have;
  by id;
  retain NUM_A NUM_B;
  if first.id then 
    do;
      NUM_A=0;
      call missing(NUM_B, NUM_C);
      /** alternatively use below syntax to set the variables to zero
      call stdize('replace','mult=',0,'none',NUM_A,NUM_B,NUM_C);
      **/
    end;
  if missing(a) then NUM_A=0; else NUM_A=NUM_A+1;
  if missing(b) then NUM_B=0; else NUM_B=sum(NUM_B,1);
  /* using syntax NUM_C+1 implicitly retains the variable */
  if missing(c) then NUM_C=0; else NUM_C+1;
run;

Also note that if the syntax to calculate NUM_C then SAS not only retains the variable implicitly but it also behaves the same like when using the sum() function.

View solution in original post

11 REPLIES 11
Tom
Super User Tom
Super User

You need explain what your red columns mean, as clearly as you can.

Note your words mention COUNTER, but there is no such variable shown, Either as input or as one to be derived.

 

Also your data is sorted by ID and DATE , but not by TYPE.  How was the order among the rows with the same values of ID and DATE set?  

ebeth
Fluorite | Level 6

I used PROC SORT to sort BY ID DATE (must have forgotten TYPE). If I do sort by TYPE as well though, my goal is to add in the 3 columns in red. The red columns are not part of the original data; I want to create those counts. 

 

For example, NUM_A should count "1", then  "2" for the first 2 A's on the same date and ID.  or 3 if there is a 3rd A for that date and ID Then start the count over for the next A on a different date, but same ID.  

Tom
Super User Tom
Super User

So you want running count of NON missing values of A, B and C?

data want;
  set have;
  by id date type;
  if first.date then do;
    num_a=0;
    num_b=0;
    num_c=0;
  end;
  num_a +  not missing(A);
  num_b +  not missing(B);
  num_c +  not missing(C);
run;

If you want to count number of distinct values of A (or B or C) then hat will be harder.

ebeth
Fluorite | Level 6

Yes, I want running count of non missing values of A, B and C.

 

This works perfectly for NUM_B and NUM_C.

But NUM_A  is showing a "1" for every line on the same date. Then a 0 for the next new date. Even when there is a missing value of A it outputs 1 until new date.

Tom
Super User Tom
Super User

Show the example data that is not producing expected results. 

Also from your original example it looks like the A,B and C variables are not really needed as you could just use the value of TYPE instead to tell which variable to increment.

Patrick
Opal | Level 21

Below a coding option.

/* create sample data */
data have;
  infile datalines truncover dlm='|';
  input ID DATE:mmddyy10. TYPE $ A B C _NUM_A _NUM_B _NUM_C;
  format date date9.;
  datalines;
1|1/1/2010|B|.|10|.|0|1|0
1|1/1/2010|A|3|.|.|1|0|0
1|1/1/2010|A|7|.|.|2|0|0
1|1/1/2010|C|.|.|11|0|0|1
1|3/17/2010|B|.|5|.|0|1|0
1|3/17/2010|A|4|.|.|1|0|0
1|3/17/2010|C|.|.|3|0|0|1
1|5/1/2010|B|.|7|.|0|1|0
1|5/1/2010|A|3|.|.|1|0|0
1|5/1/2010|C|.|.|4|0|0|
;

/* create desired result */
data want;
  set have;
  by id;
  retain NUM_A NUM_B;
  if first.id then call missing(NUM_A, NUM_B, NUM_C);
  if missing(a) then NUM_A=0; else NUM_A=NUM_A+1;
  if missing(b) then NUM_B=0; else NUM_B=NUM_B+1;
  /* using syntax NUM_C+1 implicitly retains the variable */
  if missing(c) then NUM_C=0; else NUM_C+1;
run;

title 'Want';
proc print data=want;
run;
title;
ebeth
Fluorite | Level 6

This worked almost perfectly. Thank you. 

One odd thing though is that the very first NUM_A is a missing value. The only missing value.  Instead of zero. I thought the RETAIN statement set that to zero?

ballardw
Super User

@ebeth wrote:

This worked almost perfectly. Thank you. 

One odd thing though is that the very first NUM_A is a missing value. The only missing value.  Instead of zero. I thought the RETAIN statement set that to zero?


Retain can optionally assign a starting value:

 

Retain varname 0;

If no value is specified it will be missing.

 

You would have to show your actual code for us to see if there was something other cause of a missing value.

Patrick
Opal | Level 21

@ebeth wrote:

This worked almost perfectly. Thank you. 

One odd thing though is that the very first NUM_A is a missing value. The only missing value.  Instead of zero. I thought the RETAIN statement set that to zero?


Good catch. The RETAIN statement just retains values instead of setting them to missing before reading in a new line of data.

What's happening here: The first value of B is not missing so here the logic for the value in the first observation leads to the else case: NUM_B=NUM_B+1;

Because here variable NUM_B is still missing, the formula sums the values <missing>+1. This results is a missing.

To avoid such a case: Either set all the values to 0 whenever there is a new id (if first.id) or even simpler use the sum() function as done below. The sum() function ignores missings and though the result of sum(<missing>,1) is 1

 

Below code demonstrates 3 different coding options all returning the same result.

Call missing() allows to set an array of variables to missing. There is unfortunately still no call routine which would allow to set variables to another value; except for a not very often used "hack" using call stdize() 

/* create desired result */
data want;
  set have;
  by id;
  retain NUM_A NUM_B;
  if first.id then 
    do;
      NUM_A=0;
      call missing(NUM_B, NUM_C);
      /** alternatively use below syntax to set the variables to zero
      call stdize('replace','mult=',0,'none',NUM_A,NUM_B,NUM_C);
      **/
    end;
  if missing(a) then NUM_A=0; else NUM_A=NUM_A+1;
  if missing(b) then NUM_B=0; else NUM_B=sum(NUM_B,1);
  /* using syntax NUM_C+1 implicitly retains the variable */
  if missing(c) then NUM_C=0; else NUM_C+1;
run;

Also note that if the syntax to calculate NUM_C then SAS not only retains the variable implicitly but it also behaves the same like when using the sum() function.

ebeth
Fluorite | Level 6

Thank you!

ballardw
Super User

The very first thing you will need to explain is the sort order. Since to use FIRST. there must be a BY statement, then please at least share the BY statement you are using.

 

 

sas-innovate-2024.png

Don't miss out on SAS Innovate - Register now for the FREE Livestream!

Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 11 replies
  • 3878 views
  • 4 likes
  • 4 in conversation