Solved: Re: Extracting value conditional the location of the observation

lioradam · Posted 10-17-2022 01:11 PM

Hello,

I have the following dataset:

value	string
-0.04	000R201406
-0.03	000R201409
0.13	000R201412
0.14	000R201503
0.02	000R201509
0.01	000R201512
0.07	000R201603
0.15	000R201606
0.13	000R201609
0.1	000R201612
0.12	000R201612
0.2	000R201703
0.08	000R201706
0.08	000R201709
0.1	000R201712
0.17	000R201803

and I want to receive the following data set:

value_2	value_1	string
	-0.04	000R201406
	-0.03	000R201409
	0.13	000R201412
	0.14	000R201503
	0.02	000R201509
	0.01	000R201512
	0.07	000R201603
	0.15	000R201606
	0.13	000R201609
0.12	0.1	000R201612
0.12	0.1	000R201612
	0.2	000R201703
	0.08	000R201706
	0.08	000R201709
	0.1	000R201712
	0.17	000R201803

Meaning, if there are two rows with identical "string" the value of the first row will enter a variable called: value_1

and the value of the second row will enter a variable called: value_2.

Does anyone have an idea what code to use?

Thank you,

Lior

Tom · Posted 10-17-2022 11:09 PM

I don't understand why you would do that.

I could understand transposing the data so that groups with multiple observations get multiple values.

But then why would you write the same observation out twice? (or N times if the there are N observations in the group).

data have;
  input value string :$10.;
cards;
-0.04 000R201406
-0.03 000R201409
0.13 000R201412
0.14 000R201503
0.02 000R201509
0.01 000R201512
0.07 000R201603
0.15 000R201606
0.13 000R201609
0.1 000R201612
0.12 000R201612
0.2 000R201703
0.08 000R201706
0.08 000R201709
0.1 000R201712
0.17 000R201803
;

proc transpose data=have out=want(drop=_name_) prefix=value_ ;
  by string;
  var value;
run;

proc print data=want;
run;

Result:

Obs      string      value_1    value_2

  1    000R201406     -0.04        .
  2    000R201409     -0.03        .
  3    000R201412      0.13        .
  4    000R201503      0.14        .
  5    000R201509      0.02        .
  6    000R201512      0.01        .
  7    000R201603      0.07        .
  8    000R201606      0.15        .
  9    000R201609      0.13        .
 10    000R201612      0.10       0.12
 11    000R201703      0.20        .
 12    000R201706      0.08        .
 13    000R201709      0.08        .
 14    000R201712      0.10        .
 15    000R201803      0.17        .

View solution in original post

mkeintz · Posted 10-17-2022 10:25 PM

The most intuitive solution is probably this double loop, the first with an embedded SET statement, the second with an embedded OUTPUT:

data want (drop=value i n);
  do n=1 by 1 until (last.string);
    set have;
    by string;
    array val {2} value1 value2;
    val{n}=value;
  end;
  do i=1 to n;
    output;
  end;
run;

This depends crucially on there never being more than two observations for a given string value.

More compact, but more dense, is this use of self-merge with offset:

data want (drop=nxt_:);
  merge have (rename=(value=value1))
        have (rename=(string=nxt_string value=value2)  firstobs=2);

  value1=ifn(string=lag(string),lag(value1),value1);
  value2=ifn(string=lag(string),lag(value2),ifn(string^=nxt_string,.,value2));
run;

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

lioradam · Posted 10-18-2022 12:05 AM

Hi,

Thank you very much for your answer.

When I run your code on the "example figures", it works perfectly, however, in my real data I received the following error message in the log:

ERROR: Array subscript out of range at line 40 column 5.
n=3 last.ticker_period=1 CURR=USD MEASURE=EPS PDICITY=QTR UNITS=P/S value_forcast=0.17 ticker_period=02U7202109
FIRST.ticker_period=0 value1=0.26 value2=0.23 i=. _ERROR_=1 _N_=439

This is the code I run:

data number_value (drop=i n);
do n=1 by 1 until (last.ticker_period);
set FEW_GUIDANCE;
by ticker_period;
array val {2} value1 value2;
val{n}=value_forcast;
end;
do i=1 to n;
output;
end;
run;

I also export the SAS data file into excel and add it to this message.

I don't understand why I receive this error message..

Thank you,

Lior

Tom · Posted 10-17-2022 11:09 PM

I don't understand why you would do that.

I could understand transposing the data so that groups with multiple observations get multiple values.

But then why would you write the same observation out twice? (or N times if the there are N observations in the group).

data have;
  input value string :$10.;
cards;
-0.04 000R201406
-0.03 000R201409
0.13 000R201412
0.14 000R201503
0.02 000R201509
0.01 000R201512
0.07 000R201603
0.15 000R201606
0.13 000R201609
0.1 000R201612
0.12 000R201612
0.2 000R201703
0.08 000R201706
0.08 000R201709
0.1 000R201712
0.17 000R201803
;

proc transpose data=have out=want(drop=_name_) prefix=value_ ;
  by string;
  var value;
run;

proc print data=want;
run;

Result:

Obs      string      value_1    value_2

  1    000R201406     -0.04        .
  2    000R201409     -0.03        .
  3    000R201412      0.13        .
  4    000R201503      0.14        .
  5    000R201509      0.02        .
  6    000R201512      0.01        .
  7    000R201603      0.07        .
  8    000R201606      0.15        .
  9    000R201609      0.13        .
 10    000R201612      0.10       0.12
 11    000R201703      0.20        .
 12    000R201706      0.08        .
 13    000R201709      0.08        .
 14    000R201712      0.10        .
 15    000R201803      0.17        .

lioradam · Posted 10-18-2022 01:02 AM

Hi,

You are right.

The new code that you wrote is making more sense. I will use this code instead of the first one.

Thank a lot!

Lior

lioradam · Posted 10-18-2022 02:08 AM

Hi,

Could you please assist in adjusting the code if I want also a variable for value_last?

Thanks,

Lior

Tom · Posted 10-18-2022 08:25 AM

@lioradam wrote:

Hi,

Could you please assist in adjusting the code if I want also a variable for value_last?

Thanks,

Lior

If you have the output of the proc transpose and you know how many VALUE_xx variables were created then coalesce() will do it. For example if you ended up creating 13 VALUE_xx variables then the code might look like:

data new;
  set want;
  value_last = coalesce(of value_13-value_1);
run;

If you don't know how many there were you might try just re-merging with the original dataset. In fact you might want to use the UPDATE statement instead of the MERGE statement as that will automatically collapse to only one observation per group and will also handle ignoring missing values.

proc transpose data=have out=want(drop=_name_) prefix=value_ ;
  by string;
  var value;
run;

data want;
  update want have(keep=string value rename=(value=value_last));
  by string;
run;

lioradam · Posted 10-18-2022 03:42 PM

Hi Tom,

I used the second code you wrote and it worked perfectly.

Thank you!

Lior

SAS Innovate 2025: Call for Content

Classroom Training Available!