Hi all SAS Users.
I want to duplicate the value in SAS as description below:
I have a dataset report_firm_year_2d
gviidkey	year	LOC	s_yc_y	        APPDAYSUM	chl_y_c	        _0_returnday	SEDOL
001166_01W	1999	NLD	-0.000066294	253	        0	            1	            5165294
001166_01W	2000	NLD	-0.000088378	249	        0	            .	            5165294
001166_01W	2001	NLD	-0.000135834	252	        0	            .	            5165294
001166_02W	1999	NLD	-0.000103182	105	        0	            1      	        5584480
001166_02W	2000	NLD	0.0007173126	143      	0.0535653833	.	            5584480
001166_02W	2001	NLD	0.0001280852	84	        0.0226349455	.	            5584480
001855_03W	1999	PHL	0.0005017105	86	        0.0447977912	2	            BD0R0N4
001855_03W	2000	PHL	0.0013135089	42	        0.0724847262	.	
001855_03W	2001	PHL	0.0075094259	24	        0.173313888	    2	            BD0R0N4The column I want to focus here is gviidkey and SEDOL. As can be seen from the last column, the last three rows having the same gviidkey, what I want is to fill the second-to-last row of the last column by "BD0R0N4".
Could you please hint me how to do that ?
Warmest regards.
Are you saying that you merely want to populate blank values of SEDOL with whatever non-blank precedes it (within a given gviidkey)? I.e. are you merely trying to carry forward non-missing values? If so, then:
data want (drop=_:);
  set have;
  by gviidkey;
  retain _nonblnk_sedol '       ';   /*SEDOL takes 7 characters */
  if first.gviidkey=1 or sedol^=' ' then _nonblnk_sedol=sedol;
  if sedol=' ' then sedol=_nonblnk_sedol;
run;
Do you ever have to carry sedol back from the future?
Can SEDOL change over time for a given GVIIDKEY? If so, what will you do if SEDOL is blank between two differing values from the previous and next years?
Edited addition: you can be a little more economical, but a bit more obscure with:
data want (drop=_tst:);
  set have (rename=(sedol=_tst_sedol));
  by gviidkey;
  if _tst_sedol^=' ' then set have (where=(sedol^=' ') keep=sedol);
run;
Thank you for your suggestion! I want to populate blank values of SEDOL with the SEDOL of the non-blank precedes or after it (within a given gviidkey)
For example
input
gviidkey       year        SEDOL
001166_01W     1999        5165294
001166_01W     2000        .
001166_01W     2001        5165294
001166_02W     1999        .
001166_02W     2000        5584480
001166_02W     2001        5584480
001855_03W     1999        .
001855_03W     2000        .
001855_03W     2001        .
The result I want is
output
gviidkey       year        SEDOL
001166_01W     1999        5165294
001166_01W     2000        5165294
001166_01W     2001        5165294
001166_02W     1999        5584480
001166_02W     2000        5584480
001166_02W     2001        5584480
001855_03W     1999        .
001855_03W     2000        .
001855_03W     2001        .Warmest regards.
Assuming
then a self-merge does exactly what you want:
data want;
  merge have  
        have (keep=gviidkey sedol where=(sedol^=' '));
  by gviidkey;
run;
This does a match-merge of HAVE with a subset of HAVE. The subset is only the non-blank SEDOL's. Since for any given GVIIDKEY, there may be fewer non-blanks SEDOL's, their values wll be propagated through all of the "excess" records in the full sample.
"Second to last" is such an imprecise term and not very generic.
Let's see if another "rule" works for your data:
If a value is missing and the previous record is from the same "identification group" (how ever that may be defined) then replace the missing value with the previous assigned value for the variable.
If that rule is acceptable then this works for this example:
data have; infile datalines truncover; input gviidkey :$11. year LOC $3. s_yc_y APPDAYSUM chl_y_c _0_returnday SEDOL :$7.; datalines; 001166_01W 1999 NLD -0.000066294 253 0 1 5165294 001166_01W 2000 NLD -0.000088378 249 0 . 5165294 001166_01W 2001 NLD -0.000135834 252 0 . 5165294 001166_02W 1999 NLD -0.000103182 105 0 1 5584480 001166_02W 2000 NLD 0.0007173126 143 0.0535653833 . 5584480 001166_02W 2001 NLD 0.0001280852 84 0.0226349455 . 5584480 001855_03W 1999 PHL 0.0005017105 86 0.0447977912 2 BD0R0N4 001855_03W 2000 PHL 0.0013135089 42 0.0724847262 . . 001855_03W 2001 PHL 0.0075094259 24 0.173313888 2 BD0R0N4 ; data want; set have; by gviidkey; length tempvar $ 7; /*long enough to hold values of SEDOL*/ retain tempvar; if first.gviidkey then call missing(tempvar); if not missing(sedol) then tempvar=sedol; if missing(sedol) then sedol=tempvar; drop tempvar; run;
This will work for sequential missing values setting all of them the same.
It will not work if the FIRST value for an id group is missing however. If you have that, then you need to provide a different example and the rules involved in a bit more generic form.
Note that if SEDOL value is always the same for the Gviidkey value and never assigned to a different Key it is practically a redundant variable (see data normalization).
This question gets asked in one form or another about 4 or 5 times per week. Specific details about "first" value per group and order of processing may have an impact.
Please provide data in the form of a working data step. Mixed columns of tab and space separated junk takes time to clean up.
It's finally time to hack! Remember to visit the SAS Hacker's Hub regularly for news and updates.
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.
