BookmarkSubscribeRSS Feed
Phil_NZ
Barite | Level 11

Hi all SAS Users.

I want to duplicate the value in SAS as description below:

I have a dataset report_firm_year_2d

gviidkey	year	LOC	s_yc_y	        APPDAYSUM	chl_y_c	        _0_returnday	SEDOL
001166_01W	1999	NLD	-0.000066294	253	        0	            1	            5165294
001166_01W	2000	NLD	-0.000088378	249	        0	            .	            5165294
001166_01W	2001	NLD	-0.000135834	252	        0	            .	            5165294
001166_02W	1999	NLD	-0.000103182	105	        0	            1      	        5584480
001166_02W	2000	NLD	0.0007173126	143      	0.0535653833	.	            5584480
001166_02W	2001	NLD	0.0001280852	84	        0.0226349455	.	            5584480
001855_03W	1999	PHL	0.0005017105	86	        0.0447977912	2	            BD0R0N4
001855_03W	2000	PHL	0.0013135089	42	        0.0724847262	.	
001855_03W	2001	PHL	0.0075094259	24	        0.173313888	    2	            BD0R0N4

The column I want to focus here is gviidkey and SEDOL. As can be seen from the last column, the last three rows having the same gviidkey, what I want is to fill the second-to-last row of the last column by "BD0R0N4".

Could you please hint me how to do that ?

Warmest regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
4 REPLIES 4
mkeintz
PROC Star

Are you saying that you merely want to populate blank values of SEDOL with whatever non-blank precedes it (within a given gviidkey)?  I.e. are you merely trying to carry forward non-missing values?  If so, then:

 

data want (drop=_:);
  set have;
  by gviidkey;
  retain _nonblnk_sedol '       ';   /*SEDOL takes 7 characters */
  if first.gviidkey=1 or sedol^=' ' then _nonblnk_sedol=sedol;
  if sedol=' ' then sedol=_nonblnk_sedol;
run;

 

Do you ever have to carry sedol back from the future?

 

Can SEDOL change over time for a given GVIIDKEY?  If so, what will you do if SEDOL is blank between two differing values from the previous and next years?

 

Edited addition: you can be a little more economical, but a bit more obscure with:

data want (drop=_tst:);
  set have (rename=(sedol=_tst_sedol));
  by gviidkey;
  if _tst_sedol^=' ' then set have (where=(sedol^=' ') keep=sedol);
run;
--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Phil_NZ
Barite | Level 11

Hi @mkeintz and @ballardw 

 

Thank you for your suggestion! I want to populate blank values of SEDOL with the SEDOL of the non-blank precedes or after it (within a given gviidkey)

 

For example

input

gviidkey       year        SEDOL
001166_01W     1999        5165294
001166_01W     2000        .
001166_01W     2001        5165294
001166_02W     1999        .
001166_02W     2000        5584480
001166_02W     2001        5584480
001855_03W     1999        .
001855_03W     2000        .
001855_03W     2001        .


The result I want is 

output

gviidkey       year        SEDOL
001166_01W     1999        5165294
001166_01W     2000        5165294
001166_01W     2001        5165294
001166_02W     1999        5584480
001166_02W     2000        5584480
001166_02W     2001        5584480
001855_03W     1999        .
001855_03W     2000        .
001855_03W     2001        .

Warmest regards.

Thank you for your help, have a fabulous and productive day! I am a novice today, but someday when I accumulate enough knowledge, I can help others in my capacity.
mkeintz
PROC Star

Assuming

  1. data are sorted by gviidkey
  2. you never see a change in the SEDOL for a given GVIIDKEY (i.e. aside from blanks, there is no variation in the SEDOL value)

then a self-merge does exactly what you want:

data want;
  merge have  
        have (keep=gviidkey sedol where=(sedol^=' '));
  by gviidkey;
run;

This does a match-merge of HAVE with a subset of HAVE.  The subset is only the non-blank SEDOL's.  Since for any given GVIIDKEY, there may be fewer non-blanks SEDOL's, their values wll be propagated through all of the "excess" records in the full sample.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
ballardw
Super User

"Second to last" is such an imprecise term and not very generic.

Let's see if another "rule" works for your data:

 

If a value is missing and the previous record is from the same "identification group" (how ever that may be defined) then replace the missing value with the previous assigned value for the variable.

 

 

If that rule is acceptable then this works for this example:

data have;
 infile datalines  truncover;
 input gviidkey :$11. year LOC $3. s_yc_y APPDAYSUM chl_y_c _0_returnday SEDOL :$7.;
datalines;
001166_01W 1999 NLD -0.000066294 253 0	 1	 5165294
001166_01W 2000 NLD -0.000088378 249 0	 .	 5165294
001166_01W 2001 NLD -0.000135834 252 0	 .	 5165294
001166_02W 1999 NLD -0.000103182 105 0	 1 	 5584480
001166_02W 2000 NLD 0.0007173126 143 0.0535653833	.	 5584480
001166_02W 2001 NLD 0.0001280852 84 0.0226349455	.	 5584480
001855_03W 1999 PHL 0.0005017105 86 0.0447977912	2	 BD0R0N4
001855_03W 2000 PHL 0.0013135089 42 0.0724847262	.	 .
001855_03W 2001 PHL 0.0075094259 24 0.173313888	 2	 BD0R0N4
;

data want;
   set have;
   by gviidkey;
   length tempvar $ 7; /*long enough to hold values of SEDOL*/
   retain tempvar;
   if first.gviidkey then call missing(tempvar);
   if not missing(sedol) then tempvar=sedol;
   if missing(sedol) then sedol=tempvar;
   drop tempvar;
run;

This will work for sequential missing values setting all of them the same.

It will not work if the FIRST value for an id group is missing however. If you have that, then you need to provide a different example and the rules involved in a bit more generic form.

 

Note that if SEDOL value is always the same for the Gviidkey value and never assigned to a different Key it is practically a redundant variable (see data normalization).

 

This question gets asked in one form or another about 4 or 5 times per week. Specific details about "first" value per group and order of processing may have an impact.

 

Please provide data in the form of a working data step. Mixed columns of tab and space separated junk takes time to clean up.

SAS Innovate 2025: Call for Content

Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!

Submit your idea!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 586 views
  • 2 likes
  • 3 in conversation