Hi all SAS Users.
I want to duplicate the value in SAS as description below:
I have a dataset report_firm_year_2d
gviidkey year LOC s_yc_y APPDAYSUM chl_y_c _0_returnday SEDOL
001166_01W 1999 NLD -0.000066294 253 0 1 5165294
001166_01W 2000 NLD -0.000088378 249 0 . 5165294
001166_01W 2001 NLD -0.000135834 252 0 . 5165294
001166_02W 1999 NLD -0.000103182 105 0 1 5584480
001166_02W 2000 NLD 0.0007173126 143 0.0535653833 . 5584480
001166_02W 2001 NLD 0.0001280852 84 0.0226349455 . 5584480
001855_03W 1999 PHL 0.0005017105 86 0.0447977912 2 BD0R0N4
001855_03W 2000 PHL 0.0013135089 42 0.0724847262 .
001855_03W 2001 PHL 0.0075094259 24 0.173313888 2 BD0R0N4
The column I want to focus here is gviidkey and SEDOL. As can be seen from the last column, the last three rows having the same gviidkey, what I want is to fill the second-to-last row of the last column by "BD0R0N4".
Could you please hint me how to do that ?
Warmest regards.
Are you saying that you merely want to populate blank values of SEDOL with whatever non-blank precedes it (within a given gviidkey)? I.e. are you merely trying to carry forward non-missing values? If so, then:
data want (drop=_:);
set have;
by gviidkey;
retain _nonblnk_sedol ' '; /*SEDOL takes 7 characters */
if first.gviidkey=1 or sedol^=' ' then _nonblnk_sedol=sedol;
if sedol=' ' then sedol=_nonblnk_sedol;
run;
Do you ever have to carry sedol back from the future?
Can SEDOL change over time for a given GVIIDKEY? If so, what will you do if SEDOL is blank between two differing values from the previous and next years?
Edited addition: you can be a little more economical, but a bit more obscure with:
data want (drop=_tst:);
set have (rename=(sedol=_tst_sedol));
by gviidkey;
if _tst_sedol^=' ' then set have (where=(sedol^=' ') keep=sedol);
run;
Thank you for your suggestion! I want to populate blank values of SEDOL with the SEDOL of the non-blank precedes or after it (within a given gviidkey)
For example
input
gviidkey year SEDOL
001166_01W 1999 5165294
001166_01W 2000 .
001166_01W 2001 5165294
001166_02W 1999 .
001166_02W 2000 5584480
001166_02W 2001 5584480
001855_03W 1999 .
001855_03W 2000 .
001855_03W 2001 .
The result I want is
output
gviidkey year SEDOL
001166_01W 1999 5165294
001166_01W 2000 5165294
001166_01W 2001 5165294
001166_02W 1999 5584480
001166_02W 2000 5584480
001166_02W 2001 5584480
001855_03W 1999 .
001855_03W 2000 .
001855_03W 2001 .
Warmest regards.
Assuming
then a self-merge does exactly what you want:
data want;
merge have
have (keep=gviidkey sedol where=(sedol^=' '));
by gviidkey;
run;
This does a match-merge of HAVE with a subset of HAVE. The subset is only the non-blank SEDOL's. Since for any given GVIIDKEY, there may be fewer non-blanks SEDOL's, their values wll be propagated through all of the "excess" records in the full sample.
"Second to last" is such an imprecise term and not very generic.
Let's see if another "rule" works for your data:
If a value is missing and the previous record is from the same "identification group" (how ever that may be defined) then replace the missing value with the previous assigned value for the variable.
If that rule is acceptable then this works for this example:
data have; infile datalines truncover; input gviidkey :$11. year LOC $3. s_yc_y APPDAYSUM chl_y_c _0_returnday SEDOL :$7.; datalines; 001166_01W 1999 NLD -0.000066294 253 0 1 5165294 001166_01W 2000 NLD -0.000088378 249 0 . 5165294 001166_01W 2001 NLD -0.000135834 252 0 . 5165294 001166_02W 1999 NLD -0.000103182 105 0 1 5584480 001166_02W 2000 NLD 0.0007173126 143 0.0535653833 . 5584480 001166_02W 2001 NLD 0.0001280852 84 0.0226349455 . 5584480 001855_03W 1999 PHL 0.0005017105 86 0.0447977912 2 BD0R0N4 001855_03W 2000 PHL 0.0013135089 42 0.0724847262 . . 001855_03W 2001 PHL 0.0075094259 24 0.173313888 2 BD0R0N4 ; data want; set have; by gviidkey; length tempvar $ 7; /*long enough to hold values of SEDOL*/ retain tempvar; if first.gviidkey then call missing(tempvar); if not missing(sedol) then tempvar=sedol; if missing(sedol) then sedol=tempvar; drop tempvar; run;
This will work for sequential missing values setting all of them the same.
It will not work if the FIRST value for an id group is missing however. If you have that, then you need to provide a different example and the rules involved in a bit more generic form.
Note that if SEDOL value is always the same for the Gviidkey value and never assigned to a different Key it is practically a redundant variable (see data normalization).
This question gets asked in one form or another about 4 or 5 times per week. Specific details about "first" value per group and order of processing may have an impact.
Please provide data in the form of a working data step. Mixed columns of tab and space separated junk takes time to clean up.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.