Solved: Re: How do I subset a dataset, create a new row from its content, and ...

friendguy13 · Posted 11-30-2020 12:45 AM

I am working with a data set of water samples that were tested for the presence of legionella using PCR. I most cases multiple samples were collected from different sites in the same building and all of the rows of the table contain a building ID indicating which building they are from. I am looking to subset the data by building ID, build a new row of data for each subset consisting of the combined stats from all the rows of the subset, and then merge the newly created rows into a new data set. I know I can subset the data by using "by Building_ID" but I am totally lost on creating and merging a new data set from those subsets.

For reference my data looks something like this:

Sample ID	Building ID	Facility Type	Legionella Present	Legionella Load	Building Floors
1	1	1	1	500	1
2	1	1	0	0	1
3	2	2	0	0	3
4	3	2	1	6000	5
5	3	2	0	0	5
6	3	2	0	0	5
7	4	1	1	200	2
8	4	1	1	400	2

And I need my output to look something like this:

Building ID	Legionella Present	Legionella Load Max	Legionella Load Min	Legionella Load Average	Building Floors
1	1	500	0	250	1
2	0	0	0	0	3
3	1	6000	0	2000	5
4	1	400	200	300	2

LittlesasMaster · Posted 11-30-2020 02:40 AM

You can also try simple proc sql query.

/* Created sample data here */

proc import datafile='C:\Users\Dharmu\Desktop\Minu SAS Training\Water.xlsx' out=have dbms=xlsx replace;run;

proc sort data=have;by building_id Legionella_Load;run;

proc sql;
create table want as
select Building_ID,
max(Legionella_Present) as Legionella_Present,
min(Legionella_Load) as Min_Legionella_Load,
max(Legionella_Load) as Max_Legionella_Load,
Avg(Legionella_Load) as Avg_Legionella_Load,
max(building_floors) as building_floors
from have group by Building_ID;
quit;

proc print data=want;run;

View solution in original post

andreas_lds · Posted 11-30-2020 01:19 AM

Assuming that Legionella_Present and Legionella_Load are numeric:

proc summary data=work.have nway;
   class Building_ID;
   id Building_Floors;
   var  Legionella_Present Legionella_Load;
   output out= work.calc(drop= _type_ _freq_) max= min(Legionella_Load)= mean(Legionella_Load)= /autoname;
run;

LittlesasMaster · Posted 11-30-2020 02:40 AM

You can also try simple proc sql query.

/* Created sample data here */

proc import datafile='C:\Users\Dharmu\Desktop\Minu SAS Training\Water.xlsx' out=have dbms=xlsx replace;run;

proc sort data=have;by building_id Legionella_Load;run;

proc sql;
create table want as
select Building_ID,
max(Legionella_Present) as Legionella_Present,
min(Legionella_Load) as Min_Legionella_Load,
max(Legionella_Load) as Max_Legionella_Load,
Avg(Legionella_Load) as Avg_Legionella_Load,
max(building_floors) as building_floors
from have group by Building_ID;
quit;

proc print data=want;run;

tagawa_sas · Posted 11-30-2020 05:54 PM

You could potentially use a PROC REPORT function to get the data pretty close to what you want.

DATA WORK.QA;
INFILE DATALINES;
INPUT SampleID
BuildingID
FacilityType
LegionellaPresent
LegioellaLoad
Buildingfloors
;
DATALINES;
1 1 1 1 500 1
2 1 1 0 0 1
3 2 2 0 0 3
4 3 2 1 6000 5
5 3 2 0 0 5
6 3 2 0 0 5
7 4 1 1 200 2
8 4 1 1 400 2
;

So based off the data that you gave us, I read the data in by using datalines, so the names of the variables are listed above. After that I used a PROC REPORT function to get the table.

PROC REPORT DATA = WORK.QA
SPLIT = "/";
COLUMN BuildingID LegionellaPresent LegioellaLoad,(MAX MIN MEAN) Buildingfloors;

DEFINE BuildingID / GROUP "Building/ID";
DEFINE LegionellaPresent / "Legionella/Present";
DEFINE Buildingfloors / GROUP "Building/floors";
DEFINE LegioellaLoad / "Legioella Load";
RUN;

So here I am creating a table with the columns as the variable names for BuildingID LegionellaPresent LegioellaLoad, Buildingfloors. For LegioellaLoad I asked SAS to find the max, min, and mean.

Then I grouped the data by BuildingID and Buildingfloors, and renamed all the variables, so that they somewhat matched the description above.

At the end, the table should look something like this.

Legioella Load BuildingID LegionellaPresent MAX MIN MEAN Buildingfloors

1	1	500	0	250	1
2	0	0	0	0	3
3	1	6000	0	2000	5
4	2	400	200	300	2

Hope this helps!

hhinohar · Posted 11-30-2020 08:17 PM

data have;
	infile datalines dlm="09"x;
	input Sample_ID Building_ID Facility_Type Legionella_Present Legionella_Load 
		Building_Floors;
	datalines;
1	1	1	1	500	1
2	1	1	0	0	1
3	2	2	0	0	3
4	3	2	1	6000	5
5	3	2	0	0	5
6	3	2	0	0	5
7	4	1	1	200	2
8	4	1	1	400	2
;
run;

data want(keep=Building_ID
	       Legionella_Present_max
	       Legionella_load_max
	       Legionella_load_min
	       Legionella_load_Average
	       Building_Floors rename=(Legionella_Present_max=Legionella_Present)
);
	*keep variables ordered using length statement;
	length Building_ID 
	       Legionella_Present_max 
	       Legionella_load_max 
	       Legionella_load_min 
	       Legionella_load_Average
	       Building_Floors 8;
	*define arrays to calculate min and max;
	array l_[10];
	array p_[10];
	       
	do _N_=1 by 1 until(last.building_id);
		set have;
		by building_id;
		
		*input Legionella_Load and Legionella_Present into array by obs;
		l_[_N_]=Legionella_Load;
		p_[_N_]=Legionella_Present;
		
		*calculate max and min using of keyword;
		Legionella_load_max=max(of l_[*]);
		Legionella_load_min=min(of l_[*]);
		Legionella_Present_max=max(of p_[*]);
		
		if first.building_id 
			then
				sum=.;
				
		*sum statement;
		sum+Legionella_Load;
		Legionella_load_Average=sum/_N_;
	end;
	*drop Legionella_Present variable and rename it in data step rename option;
	drop Legionella_Present;
run;

friendguy13 · Posted 12-01-2020 02:04 PM

Thank you very much for your help everyone I ended up using the "proc sql" since it seemed like it would be the easiest to repurpose and add extra data to for my other analyses.

How do I subset a dataset, create a new row from its content, and remerge the rows to a new dataset?

Re: How do I subset a dataset, create a new row from its content, and remerge the rows to a new data

Re: How do I subset a dataset, create a new row from its content, and remerge the rows to a new data

Re: How do I subset a dataset, create a new row from its content, and remerge the rows to a new data

Re: How do I subset a dataset, create a new row from its content, and remerge the rows to a new data

Re: How do I subset a dataset, create a new row from its content, and remerge the rows to a new data

Re: How do I subset a dataset, create a new row from its content, and remerge the rows to a new data

How do I subset a dataset, create a new row from its content, and remerge the rows to a new dataset?

Re: How do I subset a dataset, create a new row from its content, and remerge the rows to a new data

Re: How do I subset a dataset, create a new row from its content, and remerge the rows to a new data

Re: How do I subset a dataset, create a new row from its content, and remerge the rows to a new data

Re: How do I subset a dataset, create a new row from its content, and remerge the rows to a new data

Re: How do I subset a dataset, create a new row from its content, and remerge the rows to a new data

Re: How do I subset a dataset, create a new row from its content, and remerge the rows to a new data

Registration is open

SAS Training: Just a Click Away