Missing Years for some raws

BhargavDesai · Posted 01-27-2020 02:04 PM

Hello

I have dataset in following format.

Partner Year Value

Algeria 2010 20.00

Algeria 2011 40.00

Afghanistan 2013 40.00

Afghanistan 2017 30.00

China 2010 20.00

China 2011 20.00

China 2012 40.00

China 2013 25.00

China 2014 30.00

China 2015 20.00

...

---

--

What I am looking for is For each country, I want to have data raw for each year (2010 to 2019) with 0 if the value is missing.

Thanks

PeterClemmensen · Posted 01-27-2020 02:26 PM

data have;
input Partner :$20. Year Value;
datalines;
Algeria 2010 20.00
Algeria 2011 40.00
Afghanistan 2013 40.00
Afghanistan 2017 30.00
China 2010 20.00
China 2011 20.00
China 2012 40.00
China 2013 25.00
China 2014 30.00
China 2015 20.00
;

data want(drop=rc);
   if _N_=1 then do;
      declare hash h (dataset : 'have');
      h.definekey ('Partner', 'Year');
      h.definedata ('Value');
      h.definedone();
   end;
   
   set have;
   by Partner notsorted;

   if first.Partner then do Year=2010 to 2019;
      Value=0;
      rc = h.find();
      output;
   end;
run;

The DATA to DATA Step Macro
Blog: SASnrd

r_behata · Posted 01-27-2020 02:45 PM

data have;
input Partner $12. Year     Value;
cards;
Algeria          2010    20.00
Algeria          2011    40.00
Afghanistan      2013    40.00
Afghanistan      2017    30.00
China            2010    20.00
China            2011    20.00
China            2012    40.00
China            2013    25.00
China            2014    30.00
China            2015    20.00
run;


proc freq data=have noprint;
	tables Partner * Year / sparse out=want0(drop=count percent);
run;

proc sort data=want0;
	by Partner Year;
run;


proc sort data=have;
	by Partner Year;
run;

data want1;
	merge have want0;
	by Partner Year;
run;

proc stdize data=want1 out=want reponly missing=0;
   var Value;
run;

mkeintz · Posted 01-27-2020 03:26 PM

Although I am a great user of hash object, I would suggest a single data step that uses 2 lock-step input data streams:

A "SET" stream, keeping only the PARTNER and YEAR variables. It is used only when starting a new PARTNER. If the first YEAR is greater than 2010 then output a set of dummy records up to, but not including the first observed year.
A self-MERGE that reads in ALL the variables for the current record, as well as the following YEAR (renamed to NXT_YEAR) to see if there are any upcoming gaps in YEAR. This MERGE provides the data to.
1. output the record for the current year
2. create and output any needed dummy records up to either NXT_YEAR-1, or to 2019, whichever is appropriate.

data have;
  input Partner :$20. Year Value ;
datalines;
Algeria 2010 20.00
Algeria 2011 40.00
Afghanistan 2013 40.00
Afghanistan 2017 30.00
China 2010 20.00
China 2011 20.00
China 2012 40.00
China 2013 25.00
China 2014 30.00
China 2015 20.00
;

data want (drop=sentinel: nxt_year );

  set have (keep=partner year);
  by partner notsorted;
  if first.partner=1 and year>2010 then do year=2010 to year-1;
    output;
  end;

  retain sentinel1 .;
  merge have   have (firstobs=2 keep=year rename=(year=nxt_year));
  retain sentinel2 .;
  do year=year to ifn(last.partner,2019,nxt_year-1);
    output;
    call missing(of sentinel1 -- sentinel2);
  end;
run;

This program assumes that records within each PARTNER, are sorted by YEAR, but PARTNER need not be sorted.

Note it takes advantage of the fact that by declaring SENTINEL1 and SENTINEL2 variables just before and after the MERGE statement, they are placed just to the left and to the right of all the incoming variables (except PARTNER and YEAR). That in turn makes creating a dummy record (i.e. all variables set to missing EXCEPT partner and year) easy - just by using CALL MISSING(of sentinel1 -- sentinel2).

The IFN statement sets the upper-limit of year for the DO loop. If the record-in-hand is the last for a partner output every year to 2019, otherwise only go up to NXT_YEAR-1.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------

Ksharp · Posted 01-28-2020 12:48 AM

data have;
input Partner :$20. Year Value;
datalines;
Algeria 2010 20.00
Algeria 2011 40.00
Afghanistan 2013 40.00
Afghanistan 2017 30.00
China 2010 20.00
China 2011 20.00
China 2012 40.00
China 2013 25.00
China 2014 30.00
China 2015 20.00
;
data temp;
 do year=2010 to 2019;
  output;
 end;
run;

proc sql;
 select a.*,coalesce(value,0) as value
  from (select * from 
  (select distinct Partner from have),(select distinct year from temp)) as a
 left join have as b on a.Partner=b.Partner and a.year=b.year;
quit;

Missing Years for some raws

Re: Missing Years for some raws

Re: Missing Years for some raws

Re: Missing Years for some raws

Re: Missing Years for some raws

Missing Years for some raws

Re: Missing Years for some raws

Re: Missing Years for some raws

Re: Missing Years for some raws

Re: Missing Years for some raws

SAS Innovate 2025: Register Now

SAS Training: Just a Click Away