BookmarkSubscribeRSS Feed
subhani4
Obsidian | Level 7

i have a dataset  as below;

data city;
input TIMEZONE:$20. STATE:$20. CITY:$20. POPULATION;
infile datalines dlm=' ' dsd;
datalines;
Alaska Alaska Anchorage 291826
Atlantic Puerto Rico San Antonio 1327407
Atlantic Puerto Rico San Juan 395326
Central Minnesota Minneapolis 382578
Central Texas Dallas 1197816
Eastern Florida Miami 399457
Eastern Maryland Baltimore 620961
Eastern Michigan Detroit 713777
Eastern New York New York 8175133
Mountain Arizona Phoenix 1445631
Mountain Colorado Denver 600158
Mountain Texas El Paso 649121
Pacific California Los Angeles 3792621
Pacific California Sacramento 466488
Pacific California San Francisco 805235
Pacific Nevada Las Vegas 583756
;

 

the final output should be like 

 

TIMEZONE

STATE

CITY

POPULATION

CUMUL_POP

Alaska

Alaska

Anchorage

291826

291826

Alaska

Alaska

Total

291826

291826

Alaska

Total

 

291826

291826

Atlantic

Puerto Rico

San Antonio

1327407

1619233

Atlantic

Puerto Rico

San Juan

395326

2014559

Atlantic

Puerto Rico

Total

1722733

2014559

Atlantic

Total

 

1722733

2014559

Central

Minnesota

Minneapolis

382578

2397137

Central

Minnesota

Total

382578

2397137

Central

Texas

Dallas

1197816

3594953

Central

Texas

Total

1197816

3594953

Central

Total

 

1580394

3594953

Eastern

Florida

Miami

399457

3994410

Eastern

Florida

Total

399457

3994410

Eastern

Maryland

Baltimore

620961

4615371

Eastern

Maryland

Total

620961

4615371

Eastern

Michigan

Detroit

713777

5329148

Eastern

Michigan

Total

713777

5329148

Eastern

New York

New York

8175133

13504281

Eastern

New York

Total

8175133

13504281

Eastern

Total

 

9909328

13504281

Mountain

Arizona

Phoenix

1445631

14949912

Mountain

Arizona

Total

1445631

14949912

Mountain

Colorado

Denver

600158

15550070

Mountain

Colorado

Total

600158

15550070

Mountain

Texas

El Paso

649121

16199191

Mountain

Texas

Total

649121

16199191

Mountain

Total

 

2694910

16199191

Pacific

California

Los Angeles

3792621

19991812

Pacific

California

Sacramento

466488

20458300

Pacific

California

San Francisco

805235

21263535

Pacific

California

Total

5064344

21263535

Pacific

Nevada

Las Vegas

583756

21847291

Pacific

Nevada

Total

583756

21847291

Pacific

Total

 

5648100

21847291

 

i have coded as below but not able to get the intermidiate summary:

 

proc sort data=city; by timezone state; run;

data want;
set city;
by timezone state;
if first.state then cum_pop=0;
cum_pop+population;
run;

 

could anyone please provide the solution both in proc sql as well as data step. Thanks in advance!!

2 REPLIES 2
Tom
Super User Tom
Super User

Seems pretty straight forward.

data city;
  infile datalines dlm=' ' dsd;
  input TIMEZONE :$20. STATE :$20. CITY :$20. POPULATION;
datalines;
Alaska Alaska Anchorage 291826
Atlantic "Puerto Rico" "San Antonio" 1327407
Atlantic "Puerto Rico" "San Juan" 395326
Central Minnesota Minneapolis 382578
Central Texas Dallas 1197816
Eastern Florida Miami 399457
Eastern Maryland Baltimore 620961
Eastern Michigan Detroit 713777
Eastern "New York" "New York" 8175133
Mountain Arizona Phoenix 1445631
Mountain Colorado Denver 600158
Mountain Texas "El Paso" 649121
Pacific California "Los Angeles" 3792621
Pacific California Sacramento 466488
Pacific California "San Francisco" 805235
Pacific Nevada "Las Vegas" 583756
;

data want;
  set city ;
  by timezone state  ;
  if first.state then total=0;
  if first.timezone then tz_total=0;
  total+population;
  tz_total+population;
  cumul_pop+population;
  output;
  if last.state then do;
     city='TOTAL';
     population=total;
     output;
  end;
  if last.timezone then do;
     state='TOTAL';
     city=' ';
     population=tz_total;
     output;
  end;
  drop total tz_total;
run;
     
Sajid01
Meteorite | Level 14

Hello @subhani4 
@Tom  has provided the data step approach.
Using SQL typically needs the Windowing approach (OVER(0... PARTITION BY...   ORDER BY )

This is not available in SAS.

SAS Innovate 2025: Save the Date

 SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!

Save the date!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

SAS Training: Just a Click Away

 Ready to level-up your skills? Choose your own adventure.

Browse our catalog!

Discussion stats
  • 2 replies
  • 387 views
  • 0 likes
  • 3 in conversation