Sashelp.us_data includes state-level population density in a wide form as shown below. Each state is also matched to its corresponding REGION (Midwest, Northeast, South, and West).
I’m wondering how I can draw four lines, for each of four regions, where x-axis is year (five points at 1910, 1920, 1930, 1940, and 1950) and y-axis is the mean value of population density of states in each group. Thanks in advance.
data have; set sashelp.us_data;
keep statename density_1910 density_1920 density_1930 density_1940 density_1950 region;
run;
proc sort data= have; by region; run;
proc print data= have;
var statename density_1910 density_1920 density_1930 density_1940 density_1950 region; run;
Obs STATENAME DENSITY_1910 DENSITY_1920 DENSITY_1930 DENSITY_1940 DENSITY_1950 REGION 1 Illinois 101.6 116.8 137.4 142.2 156.9 Midwest 2 Indiana 75.4 81.8 90.4 95.7 109.8 Midwest 3 Iowa 39.8 4.3 44.2 45.4 46.9 Midwest 4 Kansas 20.7 21.6 2.3 2.2 23.3 Midwest 5 Michigan 49.7 64.9 85.6 9.3 112.7 Midwest 6 Minnesota 26.1 3.0 32.2 35.1 37.5 Midwest 7 Missouri 47.9 49.5 52.8 55.1 57.5 Midwest 8 Nebraska 15.5 16.9 17.9 17.1 17.3 Midwest 9 North Dakota 8.4 9.4 9.9 9.3 0.9 Midwest 10 Ohio 116.7 14.1 162.7 169.1 194.5 Midwest 11 South Dakota 7.7 8.4 9.1 8.5 8.6 Midwest 12 Wisconsin 43.1 48.6 54.3 57.9 63.4 Midwest 13 Connecticut 230.2 285.1 331.8 35.3 414.5 Northeast 14 Maine 24.1 24.9 25.9 27.5 29.6 Northeast 15 Massachusetts 431.6 493.9 544.8 553.4 601.3 Northeast 16 New Hampshire 48.1 49.5 5.2 54.9 59.6 Northeast 17 New Jersey 34.5 429.1 549.5 565.7 657.5 Northeast 18 New York 193.4 220.4 267.1 28.6 314.7 Northeast 19 Pennsylvania 171.3 194.9 215.3 221.3 234.6 Northeast 20 Rhode Island 524.9 584.6 66.5 69.0 76.6 Northeast 21 Vermont 38.6 38.2 3.9 3.9 4.1 Northeast
The following code should be clear, but ask questions if it isn't.
data have; set sashelp.us_data;
keep statename density_1910 density_1920 density_1930 density_1940 density_1950 region;
run;
proc means data=have noprint;
class region;
output out=Means mean=;
run;
/* convert to long */
data Long;
set Means(where=(_TYPE_=1));
array yearArray[5] (1910 1920 1930 1940 1950);
array d[*] density:;
do i = 1 to dim(d);
Year = yearArray[i];
den = d[i];
output;
end;
label den = "Density";
run;
proc sgplot data=Long;
series x=Year y=Den / group = Region;
run;
Somewhere along the way, you have to convert this data to long format.
Then you compute the mean for each region using PROC SUMMARY.
Then the output data set of PROC SUMMARY can be used in PROC SGPLOT to draw the four time series lines on a plot. An example of drawing multiple time series on a plot is shown here: https://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=grstatproc&docsetTarget=n...
The following code should be clear, but ask questions if it isn't.
data have; set sashelp.us_data;
keep statename density_1910 density_1920 density_1930 density_1940 density_1950 region;
run;
proc means data=have noprint;
class region;
output out=Means mean=;
run;
/* convert to long */
data Long;
set Means(where=(_TYPE_=1));
array yearArray[5] (1910 1920 1930 1940 1950);
array d[*] density:;
do i = 1 to dim(d);
Year = yearArray[i];
den = d[i];
output;
end;
label den = "Density";
run;
proc sgplot data=Long;
series x=Year y=Den / group = Region;
run;
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.