Dear all.
I work with panel data (data sets with 5 mio. observations pr year and hundreds of variables) at University of Aarhus, department of Economics, and am looking for a program which can do the following:
For some variables are the all observations missing for some years.It would be of great use to have a program which could make a table with one observation for each year telling for all the variables whether this variable is fully missing for the specific year. Missing is defined as SAS standards for both numeric and characters.
Example of data
Header 1 | Header 2 | Header 3 | Header 4 |
---|---|---|---|
Personidentifier | Year | Male | Income |
1 | 1999 | 1 | 1000 |
2 | 1999 | 0 | 1500 |
1 | 2000 | 1 | . |
2 | 2000 | 0 | . |
1 | 2001 | 1 | 1400 |
2 | 2001 | 0 | . |
And wanted table:
Header 1 | Header 2 | Header 3 |
---|---|---|
Year | Male_miss | Income_miss |
1999 | 0 | 0 |
2000 | 0 | 1 |
2001 | 0 | 0 |
Hope it makes sense and that somebody it able to help.
Thanks a lot!
Best // Silas
OK.
data have; input Personidentifier Year Male Income ; cards; 1 1999 1 1000 2 1999 0 1500 1 2000 1 . 2 2000 0 . 1 2001 1 1400 2 2001 0 . ; run; proc sql; create table want as select Year,nmiss(Male)=count(*) as male_miss,nmiss(Income)=count(*) as income_miss from have group by Year ; quit;
Ksharp
In case performance is one of your concerns (It seems to me you have a big table), comparing the following data step approach with Ksharp's Proc SQL, choose the one that runs faster:
Also note, the following solution can also ease your hard coding to some extend.
data have;
input Personidentifier Year Male Income;
cards;
1 1999 1 1000
2 1999 0 1500
1 2000 1 .
2 2000 0 .
1 2001 1 1400
2 2001 0 .
;
proc sql NOPRINT;
select name into :old separated by ' ' from dictionary.columns
where upcase(name) not in ('PERSONIDENTIFIER', 'YEAR')
AND LIBNAME='WORK' AND MEMNAME='HAVE';
SELECT cats(name,'_','miss') INTO :NEW separated by ' ' from dictionary.columns
where upcase(name) not in ('PERSONIDENTIFIER', 'YEAR')
AND LIBNAME='WORK' AND MEMNAME='HAVE';
QUIT;
data want;
retain year &new;
array new &new ;
do i=1 to dim(new);
new(i)=1;
end;
DO UNTIL (LAST.YEAR);
SET HAVE ;
BY YEAR NOTSORTED; /* IN CASE 'YEAR' IS ONLY CLUSTERED, NOT SORTED*/
array old &old.;
do i=1 to dim(old);
new(i)=ifn(not missing(old(i)),0,new(i));
end;
end;
keep year &new;
run;
Haikuo
Don't miss out on SAS Innovate - Register now for the FREE Livestream!
Can't make it to Vegas? No problem! Watch our general sessions LIVE or on-demand starting April 17th. Hear from SAS execs, best-selling author Adam Grant, Hot Ones host Sean Evans, top tech journalist Kara Swisher, AI expert Cassie Kozyrkov, and the mind-blowing dance crew iLuminate! Plus, get access to over 20 breakout sessions.
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.