BookmarkSubscribeRSS Feed
BhargavDesai
Calcite | Level 5

Hello

 

I have dataset in following format.

 

Partner            Year     Value

Algeria             2010    20.00

Algeria             2011    40.00

Afghanistan      2013   40.00

Afghanistan      2017   30.00

China               2010    20.00

China               2011    20.00

China               2012    40.00

China               2013    25.00

China               2014    30.00

China               2015    20.00

...

---

--

 

What I am looking for is For each country,  I want to have data raw for each year (2010 to 2019)  with 0 if the value is missing.

 

Thanks

4 REPLIES 4
PeterClemmensen
Tourmaline | Level 20
data have;
input Partner :$20. Year Value;
datalines;
Algeria 2010 20.00
Algeria 2011 40.00
Afghanistan 2013 40.00
Afghanistan 2017 30.00
China 2010 20.00
China 2011 20.00
China 2012 40.00
China 2013 25.00
China 2014 30.00
China 2015 20.00
;

data want(drop=rc);
   if _N_=1 then do;
      declare hash h (dataset : 'have');
      h.definekey ('Partner', 'Year');
      h.definedata ('Value');
      h.definedone();
   end;
   
   set have;
   by Partner notsorted;

   if first.Partner then do Year=2010 to 2019;
      Value=0;
      rc = h.find();
      output;
   end;
run;
r_behata
Barite | Level 11
data have;
input Partner $12. Year     Value;
cards;
Algeria          2010    20.00
Algeria          2011    40.00
Afghanistan      2013    40.00
Afghanistan      2017    30.00
China            2010    20.00
China            2011    20.00
China            2012    40.00
China            2013    25.00
China            2014    30.00
China            2015    20.00
run;


proc freq data=have noprint;
	tables Partner * Year / sparse out=want0(drop=count percent);
run;

proc sort data=want0;
	by Partner Year;
run;


proc sort data=have;
	by Partner Year;
run;

data want1;
	merge have want0;
	by Partner Year;
run;

proc stdize data=want1 out=want reponly missing=0;
   var Value;
run;
mkeintz
PROC Star

 

Although I am a great user of hash object, I would suggest a single data step that uses 2 lock-step input data streams:


  1. A "SET" stream, keeping only the PARTNER and YEAR variables.  It is used only when starting a new PARTNER.  If the first YEAR is greater than 2010 then output a set of dummy records up to, but not including the first observed year.
  2. A self-MERGE that reads in ALL the variables for the current record, as well as the following YEAR (renamed to NXT_YEAR) to see if there are any upcoming gaps in YEAR.  This MERGE provides the data to.
    1. output the record for the current year
    2. create and output any needed dummy records up to either NXT_YEAR-1, or to 2019, whichever is appropriate.

 

data have;
  input Partner :$20. Year Value ;
datalines;
Algeria 2010 20.00
Algeria 2011 40.00
Afghanistan 2013 40.00
Afghanistan 2017 30.00
China 2010 20.00
China 2011 20.00
China 2012 40.00
China 2013 25.00
China 2014 30.00
China 2015 20.00
;

data want (drop=sentinel: nxt_year );

  set have (keep=partner year);
  by partner notsorted;
  if first.partner=1 and year>2010 then do year=2010 to year-1;
    output;
  end;

  retain sentinel1 .;
  merge have   have (firstobs=2 keep=year rename=(year=nxt_year));
  retain sentinel2 .;
  do year=year to ifn(last.partner,2019,nxt_year-1);
    output;
    call missing(of sentinel1 -- sentinel2);
  end;
run;

This program assumes that records within each PARTNER, are sorted by YEAR, but PARTNER need not be sorted.

 

Note it takes advantage of the fact that by declaring SENTINEL1 and SENTINEL2 variables just before and after the MERGE statement, they are placed just to the left and to the right of all the incoming variables (except PARTNER and YEAR).  That in turn makes creating a dummy record (i.e. all variables set to missing EXCEPT partner and year) easy - just by using CALL MISSING(of sentinel1 -- sentinel2).

 

The IFN statement sets the upper-limit of year for the DO loop.  If the record-in-hand is the last for a partner output every year to 2019, otherwise only go up to NXT_YEAR-1.

--------------------------
The hash OUTPUT method will overwrite a SAS data set, but not append. That can be costly. Consider voting for Add a HASH object method which would append a hash object to an existing SAS data set

Would enabling PROC SORT to simultaneously output multiple datasets be useful? Then vote for
Allow PROC SORT to output multiple datasets

--------------------------
Ksharp
Super User
data have;
input Partner :$20. Year Value;
datalines;
Algeria 2010 20.00
Algeria 2011 40.00
Afghanistan 2013 40.00
Afghanistan 2017 30.00
China 2010 20.00
China 2011 20.00
China 2012 40.00
China 2013 25.00
China 2014 30.00
China 2015 20.00
;
data temp;
 do year=2010 to 2019;
  output;
 end;
run;

proc sql;
 select a.*,coalesce(value,0) as value
  from (select * from 
  (select distinct Partner from have),(select distinct year from temp)) as a
 left join have as b on a.Partner=b.Partner and a.year=b.year;
quit;

SAS Innovate 2025: Register Now

Registration is now open for SAS Innovate 2025 , our biggest and most exciting global event of the year! Join us in Orlando, FL, May 6-9.
Sign up by Dec. 31 to get the 2024 rate of just $495.
Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 4 replies
  • 560 views
  • 0 likes
  • 5 in conversation