Hi all,
I'm after some assistance.
I need to fill in the years in between the observations in the following dataset:
data have;
input Country $ Year Disaster $;
datalines;
Afghanistan 1990 Flood
Afghanistan 1993 Epidemic
...
Afghanistan 2020 Storm
Albania 1992 Landslide
Albania 1994 Storm
...
Albania 2014 Epidemic
...
I would like to have all the observation for each country ranging from year 1990 to 2020, with a value "None" on the variable disaster for the years in which no disaster was registered. For example:
datalines;
Afghanistan 1990 Flood
Afghanistan 1991 None
Afghanistan 1992 None
Afghanistan 1993 Epidemic
...
Afghanistan 2020 Storm
Albania 1990 None
Albania 1991 None
Albania 1992 Landslide
Albania 1993 None
Albania 1994 Storm
...
Albania 2014 Epidemic
...
Albania 2020 None
Any help would be greatly appreciated .
Create a template having all country/year combinations, then update it with have:
data have;
input Country :$20. Year Disaster :$10.;
datalines;
Afghanistan 1990 Flood
Afghanistan 1993 Epidemic
Afghanistan 2020 Storm
Albania 1992 Landslide
Albania 1994 Storm
Albania 2014 Epidemic
;
data years;
do year = 1990 to 2020;
output;
end;
run;
proc sql;
create table template as
select distinct
have.country,
years.year,
"None" as disaster length=10
from have, years
;
quit;
data want;
update template have;
by country year;
run;
Try this
data have;
input Country $1 - 11 Year Disaster :$10.;
datalines;
Afghanistan 1990 Flood
Afghanistan 1993 Epidemic
Afghanistan 2020 Storm
Albania 1992 Landslide
Albania 1994 Storm
Albania 2014 Epidemic
;
data want;
if _N_ = 1 then do;
dcl hash h(dataset : 'have');
h.definekey('Country', 'Year');
h.definedata('Disaster');
h.definedone();
end;
set have;
by Country;
if first.Country then do;
do Year = 1990 to 2020;
if h.find() then Disaster = 'None';
output;
end;
end;
run;
Create a template having all country/year combinations, then update it with have:
data have;
input Country :$20. Year Disaster :$10.;
datalines;
Afghanistan 1990 Flood
Afghanistan 1993 Epidemic
Afghanistan 2020 Storm
Albania 1992 Landslide
Albania 1994 Storm
Albania 2014 Epidemic
;
data years;
do year = 1990 to 2020;
output;
end;
run;
proc sql;
create table template as
select distinct
have.country,
years.year,
"None" as disaster length=10
from have, years
;
quit;
data want;
update template have;
by country year;
run;
This is a case where use of a temporary array of disasters, indexed by year, and defaulting to 'None" is a nice solution:
data have;
input Country $ Year Disaster :$9.;
datalines;
Afghanistan 1990 Flood
Afghanistan 1993 Epidemic
Afghanistan 2020 Storm
Albania 1992 Landslide
Albania 1994 Storm
run;
data want;
set have;
by country;
array dhist {1990:2020} $9 _temporary_ (31*'None');
dhist{year}=disaster;
if last.country then do year=1990 to 2020;
disaster=dhist{year};
output;
dhist{year}='None';
end;
run;
The _temporary_ array will have its values (i.e. mostly a series of 'None') retained from one observation to the next.
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
SAS' Charu Shankar shares her PROC SQL expertise by showing you how to master the WHERE clause using real winter weather data.
Find more tutorials on the SAS Users YouTube channel.