BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
jungjein
Calcite | Level 5

I have an excel file with data looking like this:

Screen Shot 2019-01-25 at 9.22.40 pm.png

 

The data shows the company and it's operating locations.

How do I fill in those rows with missing Company and Profit with the previous values but if it hits a row with non-missing Company and Profit then it stops?

i.e. rows 3 to 8 should be Company='Microsoft' and Profit=1000

row 10 should be Company='Apple' and Profit=2000

 

1 ACCEPTED SOLUTION

Accepted Solutions
PeterClemmensen
Tourmaline | Level 20
data have;
informat Company $20.Profit 8. Location $20.;
input Company $ Profit Location;
infile datalines dlm=',' dsd missover;
datalines;
Microsoft,1000,United States
,,India
,,Hong Kong
,,United Kingdom
,,Ireland
,,Korea
,,Japan
Apple,2000,United States
,,China
Alphabet,20,United States
,,Netherlands
,,Ireland
,,China
Samsung,3,Korea
,,Japan
,,China
;

data temp;
   set have;
   if not missing(Company) then group+1;
run;

data want(drop=group);
   update temp(obs=0) temp;
   by group;
   output;
run;

View solution in original post

7 REPLIES 7
Kurt_Bremser
Super User

Use retained variables:

data want;
set have;
retain _company _profit;
if company ne ''
then _company = company;
else company = _company;
if profit ne .
then _profit = profit;
else profit = _profit;
drop _company _profit;
run;
Ali_Ouda
Calcite | Level 5

Hello 
Please Can you tell me what's the solution if the missing was before ?
If I want to fill the missing from the next row

---------------------------------------------------------------------

 

Ali_Ouda
Calcite | Level 5
Thank you very much.
PeterClemmensen
Tourmaline | Level 20
data have;
informat Company $20.Profit 8. Location $20.;
input Company $ Profit Location;
infile datalines dlm=',' dsd missover;
datalines;
Microsoft,1000,United States
,,India
,,Hong Kong
,,United Kingdom
,,Ireland
,,Korea
,,Japan
Apple,2000,United States
,,China
Alphabet,20,United States
,,Netherlands
,,Ireland
,,China
Samsung,3,Korea
,,Japan
,,China
;

data temp;
   set have;
   if not missing(Company) then group+1;
run;

data want(drop=group);
   update temp(obs=0) temp;
   by group;
   output;
run;
novinosrin
Tourmaline | Level 20

If your pattern holds true in all cases, the group creation and update will work shown by draycut and so will mine. But if your pattern differs, the best "controlled approach" by @Kurt_Bremser  is probably the safest.

 

Just to start my morning with some hash fun, here you go->

 

data have;
informat Company $20.Profit 8. Location $20.;
input Company $ Profit Location;
infile datalines dlm=',' dsd missover;
datalines;
Microsoft,1000,United States
,,India
,,Hong Kong
,,United Kingdom
,,Ireland
,,Korea
,,Japan
Apple,2000,United States
,,China
Alphabet,20,United States
,,Netherlands
,,Ireland
,,China
Samsung,3,Korea
,,Japan
,,China
;

data want;
if _n_=1 then do;
   dcl hash H () ;
   h.definekey  ("_iorc_") ;
   h.definedata ("company", "profit") ;
   h.definedone () ;
   end;
set have;
if not missing(company) and not missing(profit) then do;
_iorc_+1;
h.add();
end;
else h.find();
run;

Thank you @PeterClemmensen  for creating the data have sample. 

mkeintz
PROC Star

If

  1.  You have lots of variables that could be missing, not just 2 or 3, making the RETAIN statement approach tedious.
             and
  2.  Those variables are always missing simultaneously.

then you could take advantage of the SET with POINT= statement, and the DROP and RENAME parameters.

 

Here's an example using a dataset HAVE from SASHELP.CLASS in which each obs is completely written once, and then written 3 times with missing values SEX, WEIGHT, and HEIGHT.  Those latter records need the data to be carried forward from the non-missing collection.

 

data have ;
  set sashelp.class;
  output;
  call missing(sex, weight,height);
  do copy=1 to 3; output; end;
run;

data want (drop=_sex);
  set have (drop=weight height  rename=(sex=_sex));
  if _sex^=' ' then set have (keep=sex height weight) point=_n_;
run;

 

 

 

 

 

  1.  The SET HAVE statement reads in all the variables except the variables in question.  But it does read in and renames one of those variables (sex renamed to _sex).  The provide a single data point to determine whether all the variables-of-interest are missing.
  2.  The "IF ... THEN SET" statement only occurs when the variables-of-interest are actually present.  Because they are read by a SET statement they are automatically retained (and therefore output) until the next non-missing collection is encountered.

Note the DATA WANT step only takes 4 lines, no matter how many variables are "variables-of-interest".

 

But I repeat, this is for cases in which you know those variables are synchronized in there missing value occurances.  Of course you could break this approach down if you had, say, 2 distinct collections of such variables.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

hackathon24-white-horiz.png

2025 SAS Hackathon: There is still time!

Good news: We've extended SAS Hackathon registration until Sept. 12, so you still have time to be part of our biggest event yet – our five-year anniversary!

Register Now

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 7 replies
  • 6526 views
  • 4 likes
  • 6 in conversation