BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
friendguy13
Calcite | Level 5

I am working with a data set of water samples that were tested for the presence of legionella using PCR.  I most cases multiple samples were collected from different sites in the same building and all of the rows of the table contain a building ID indicating which building they are from.  I am looking to subset the data by building ID, build a new row of data for each subset consisting of the combined stats from all the rows of the subset, and then merge the newly created rows into a new data set.  I know I can subset the data by using "by Building_ID" but I am totally lost on creating and merging a new data set from those subsets.

 

For reference my data looks something like this:

Sample IDBuilding IDFacility TypeLegionella PresentLegionella LoadBuilding Floors
11115001
211001
322003
432160005
532005
632005
74112002
84114002

 

And I need my output to look something like this:

Building IDLegionella PresentLegionella Load MaxLegionella Load MinLegionella Load AverageBuilding Floors
1150002501
200003
316000020005
414002003002

 

1 ACCEPTED SOLUTION

Accepted Solutions
LittlesasMaster
Obsidian | Level 7

You can also try simple proc sql query.

/* Created sample data here */

proc import datafile='C:\Users\Dharmu\Desktop\Minu SAS Training\Water.xlsx' out=have dbms=xlsx replace;run;


proc sort data=have;by building_id Legionella_Load;run;

 

proc sql;
create table want as
select Building_ID,
max(Legionella_Present) as Legionella_Present,
min(Legionella_Load) as Min_Legionella_Load,
max(Legionella_Load) as Max_Legionella_Load,
Avg(Legionella_Load) as Avg_Legionella_Load,
max(building_floors) as building_floors
from have group by Building_ID;
quit;

 

proc print data=want;run;

 

View solution in original post

5 REPLIES 5
andreas_lds
Jade | Level 19

Assuming that Legionella_Present and Legionella_Load are numeric:

proc summary data=work.have nway;
   class Building_ID;
   id Building_Floors;
   var  Legionella_Present Legionella_Load;
   output out= work.calc(drop= _type_ _freq_) max= min(Legionella_Load)= mean(Legionella_Load)= /autoname;
run;
LittlesasMaster
Obsidian | Level 7

You can also try simple proc sql query.

/* Created sample data here */

proc import datafile='C:\Users\Dharmu\Desktop\Minu SAS Training\Water.xlsx' out=have dbms=xlsx replace;run;


proc sort data=have;by building_id Legionella_Load;run;

 

proc sql;
create table want as
select Building_ID,
max(Legionella_Present) as Legionella_Present,
min(Legionella_Load) as Min_Legionella_Load,
max(Legionella_Load) as Max_Legionella_Load,
Avg(Legionella_Load) as Avg_Legionella_Load,
max(building_floors) as building_floors
from have group by Building_ID;
quit;

 

proc print data=want;run;

 

tagawa_sas
Fluorite | Level 6

You could potentially use a PROC REPORT function to get the data pretty close to what you want. 

 

DATA WORK.QA;
INFILE DATALINES;
INPUT SampleID
BuildingID
FacilityType
LegionellaPresent
LegioellaLoad
Buildingfloors
;
DATALINES;
1 1 1 1 500 1
2 1 1 0 0 1
3 2 2 0 0 3
4 3 2 1 6000 5
5 3 2 0 0 5
6 3 2 0 0 5
7 4 1 1 200 2
8 4 1 1 400 2
;

 

So based off the data that you gave us, I read the data in by using datalines, so the names of the variables are listed above. After that I used a PROC REPORT function to get the table.

 

 

PROC REPORT DATA = WORK.QA
SPLIT = "/";
COLUMN BuildingID LegionellaPresent LegioellaLoad,(MAX MIN MEAN) Buildingfloors;

DEFINE BuildingID / GROUP "Building/ID";
DEFINE LegionellaPresent / "Legionella/Present";
DEFINE Buildingfloors / GROUP "Building/floors";
DEFINE LegioellaLoad / "Legioella Load";
RUN;

 

So here I am creating a table with the columns as the variable names for BuildingID LegionellaPresent LegioellaLoad, Buildingfloors. For LegioellaLoad I asked SAS to find the max, min, and mean.

 

Then I grouped the data by BuildingID and Buildingfloors, and renamed all the variables, so that they somewhat matched the description above.

 

At the end, the table should look something like this.

  Legioella Load  BuildingID LegionellaPresent MAX MIN MEAN Buildingfloors
1150002501
200003
316000020005
424002003002

 

Hope this helps!

 

hhinohar
Quartz | Level 8
data have;
	infile datalines dlm="09"x;
	input Sample_ID Building_ID Facility_Type Legionella_Present Legionella_Load 
		Building_Floors;
	datalines;
1	1	1	1	500	1
2	1	1	0	0	1
3	2	2	0	0	3
4	3	2	1	6000	5
5	3	2	0	0	5
6	3	2	0	0	5
7	4	1	1	200	2
8	4	1	1	400	2
;
run;

data want(keep=Building_ID
	       Legionella_Present_max
	       Legionella_load_max
	       Legionella_load_min
	       Legionella_load_Average
	       Building_Floors rename=(Legionella_Present_max=Legionella_Present)
);
	*keep variables ordered using length statement;
	length Building_ID 
	       Legionella_Present_max 
	       Legionella_load_max 
	       Legionella_load_min 
	       Legionella_load_Average
	       Building_Floors 8;
	*define arrays to calculate min and max;
	array l_[10];
	array p_[10];
	       
	do _N_=1 by 1 until(last.building_id);
		set have;
		by building_id;
		
		*input Legionella_Load and Legionella_Present into array by obs;
		l_[_N_]=Legionella_Load;
		p_[_N_]=Legionella_Present;
		
		*calculate max and min using of keyword;
		Legionella_load_max=max(of l_[*]);
		Legionella_load_min=min(of l_[*]);
		Legionella_Present_max=max(of p_[*]);
		
		if first.building_id 
			then
				sum=.;
				
		*sum statement;
		sum+Legionella_Load;
		Legionella_load_Average=sum/_N_;
	end;
	*drop Legionella_Present variable and rename it in data step rename option;
	drop Legionella_Present;
run;
friendguy13
Calcite | Level 5

Thank you very much for your help everyone I ended up using the "proc sql" since it seemed like it would be the easiest to repurpose and add extra data to for my other analyses.

sas-innovate-2024.png

Available on demand!

Missed SAS Innovate Las Vegas? Watch all the action for free! View the keynotes, general sessions and 22 breakouts on demand.

 

Register now!

How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 5 replies
  • 735 views
  • 4 likes
  • 5 in conversation