Re: Missing values in datasets: Due to proc transpose

ak2011 · Posted 11-25-2019 02:41 PM

I would appreciate if someone could explain this to me and provide me the the right code to
resolve this problem:
I have a smaller dataset with 17 obs. I transposed and found exposed(1) and unexposed(0) variables and found
an association between them. The resulting data was 8 obs but with dots(.) ie. missing values but
proc freq did not display or used the missing values in the computations even if I don't use the
/missing or /missprint options.(Results attached,please).
However, when I apply the same code for the larger dataset(over 100,000 obs), the dots(.) were displayed
and missing values in thousands were indicated in the proc freq results(not shown).
II want to understand why no missing data were indicated in the proc freq for the smaller dataset (even though it had dots(.)
like the larger dataset.
May I know the code to remove the dots(.) to blanks (ie. nothing) so that SAS will not show any missing value
like in the case of the smaller dataset.
Thanks.

data idnew1;
input id$ job idchem;
datalines;
os1 1 990005
os1 1 990021
os1 1 211700
os1 2 211700
os1 2 990021
os1 2 210701
os1 2 990005
os2 1 210701
os2 1 990005
os2 2 990021
os2 3 210701
os2 3 990005
os3 3 210701
os3 1 211700
os4 1 210701
os4 1 990005
os4 1 211700
;
run;




/*TRANSPOSING VARIABLES*/
proc sort data=idnew1; by id job;
proc transpose data=idnew1 out=idnew1b prefix=idchem;
by id job;
/*id job;*/
var idchem;
run;

/*Cla exposure*/
data clat; 
set idnew1b; 
if idchem1='990005'or idchem2='990005' or idchem3='990005' or idchem4='990005' then cla_exp=1; 
else cla_exp=0;
id_job=catx('_', id, job); 
put _all_; 
drop _name_;
run; 
proc print data=clat;
Title "Cla exposure";
run;

/*Bio exposure*/
data biot; 
set idnew1b; 
if idchem1='990021' or idchem2='990021' or idchem3='990021' or idchem4='990021' then bio_exp=1; 
else bio_exp=0;
id_job=catx('_', id, job); 
put _all_;
drop _name_; 
run; 
Title "Bio exposure";
proc print data=biot;

run;

/*Amo exposure*/
data amot; 
set idnew1mb; 
if idchem1='210701' or idchem2='210701' or idchem3='210701' or idchem4='210701' then amo_exp=1; 
else amo_exp=0;
id_job=catx('_', id, job); 
put _all_;
drop _name_; 
run; 

proc print data=amot;
Title "Amo exposure";
run;


/*Chl exposure*/
data chlt; 
set idnew1b; 
if idchem1='211700' or idchem2='211700' or idchem3='211700'or idchem4='211700' then chl_exp=1; 
else chl_exp=0;
id_job=catx('_', id, job); 
put _all_; 
drop _name_;
run; 

proc print data=chlt;
Title "chl exposure";
run;

/* Merging clat,biot and amot files*/
data mlt; merge clat biot amot chlt;

run;

proc print data=mlt;
Title "Merged exposure files for cla ,bio, amo and chl pollutants";
run;


/*CROSS ASSOCIATIONS:clat,biot,amot,chlt*/
proc freq data=mlt;
tables cla_exp*bio_exp;
tables cla_exp*amo_exp;
tables cla_exp*chl_exp;
tables bio_exp*cla_exp;
tables bio_exp*amo_exp;
tables bio_exp*chl_exp;
run;

art297 · Posted 11-25-2019 02:57 PM

Why do you put the idchem1 thru idchem4 values in quotes when they're numeric variables?

Also, when you try to create the dataset amot, you set the wrong filename. It should have been: set idnew1b

Art, CEO, AnalystFinder.com

PGStats · Posted 11-26-2019 12:25 AM

You can use option missing in proc freq to treat missing values as a valid level. Here is how it could be done:

data idnew1;
input id$ job idchem;
datalines;
os1 1 990005
os1 1 990021
os1 1 211700
os1 2 211700
os1 2 990021
os1 2 210701
os1 2 990005
os2 1 210701
os2 1 990005
os2 2 990021
os2 3 210701
os2 3 990005
os3 3 210701
os3 1 211700
os4 1 210701
os4 1 990005
os4 1 211700
;

proc format;
value idchem
990005 = "cla_exp"
990021 = "bio_exp"
210701 = "amo_exp"
211700 = "chl_exp";
run;

data temp;
set idnew1;
dum = 1;
format idchem idchem.;
run;

proc sort data=temp; by id job idchem; run;

proc transpose data=temp out=idnew2(drop=_name_);
by id job;
id idchem;
var dum;
run;

proc freq data=idnew2;
tables 
	cla_exp*bio_exp 
	cla_exp*amo_exp 
	cla_exp*chl_exp 
	bio_exp*amo_exp
	bio_exp*chl_exp
	amo_exp*chl_exp / missing;
run;

PG

ak2011 · Posted 11-29-2019 03:06 AM

Perfect! It works! Thanks very much!

Missing values in datasets: Due to proc transpose