I have a stacked dataset of countries with two years of data per/country and a year indicator. The format is correct (I need long, not wide); however, I have 1300+ variables and TONS of missing data. I need to figure out how many PAIRS of complete data I have for each country from both time periods (ideally with variables ranked from most complete pairs to least). Normally I would do some sort of dummy coding/sum, but given the paired component here, I can't figure out how that would work. If anyone can offer suggestions, I would be very grateful... thanks in advance!
Hi,
You can transpose your data in single column with country/year combo and then check and count non-missing pairs. Hope the following code will help what you are seeking for.
Data table Z will contain all the variable names along with non-missing country/year combo count.
data x;
infile datalines dsd dlm=' ';
input country : $9. year var1 var2 var3 var1330;
datalines;
Andora 1990 . 5 . 10
Andora 2010 3 7 . .
Aruba 1990 4 8 6 9
Aruba 2010 . 4 5 .
Argentina 1990 . 5 . 14
Argentina 2010 3 3 . .
;
run;
*count country year combo to check at later stage;
proc sql;
create table count as
select distinct country, count(distinct year) as c_year
from x
group by country;
quit;
* Transpose data to get all values in single column;
proc sort data=x;
by country year;
run;
proc transpose data=x out=y name=varnm;
by country year;
var var:;
run;
*Merge with count country/year combo information;
proc sql;
create table y2 as
select a.*, b.c_year
from y as a left join count as b
on a.country = b.country;
*Perform final check and count to get per variable available combo;
create table z as
select distinct a.varnm, sum(a.mark) as final_count
from (select distinct varnm, country, count(col1) as count, c_year, case when calculated count = c_year then 1 else 0 end as mark
from y2
group by varnm, country) AS a
group by varnm
order by final_count descending;
quit;
Thanks
Please post some data and required output.
Here is a simplified verison, the current data setup looks like this:
Country Year Var1 Var2 Var3... Var 1330
Andora 1990 . 5 . 10
Andora 2010 3 7 . .
Aruba 1990 4 8 6 9
Aruba 2010 . 4 5 .
Argentina 1990 . 5 . 14
Argentina 2010 3 3 . .
Ideally I want to know that number of non-missing pairs of data by country/year combo, ranked:
Var 2: 3 (all countries/years)
Var3: 1 (only 1 country/year pair complete - Aruba)
Var1 and Var1330: 0 (no complete pairs)
Does that help? Thanks again.
Hi,
There was a post today, almost the same thing: https://communities.sas.com/thread/57555
Check that out. You would want to go the route of generating the code as you have so many variables, maybe using a datastep and call execute, or macro code.
Thanks again for your response. I saw that post earlier, but they have a wide dataset (vs. the long format I have). I thought it a proc sql create table command might work, but I couldn't figure the coding out...
Hi,
You can transpose your data in single column with country/year combo and then check and count non-missing pairs. Hope the following code will help what you are seeking for.
Data table Z will contain all the variable names along with non-missing country/year combo count.
data x;
infile datalines dsd dlm=' ';
input country : $9. year var1 var2 var3 var1330;
datalines;
Andora 1990 . 5 . 10
Andora 2010 3 7 . .
Aruba 1990 4 8 6 9
Aruba 2010 . 4 5 .
Argentina 1990 . 5 . 14
Argentina 2010 3 3 . .
;
run;
*count country year combo to check at later stage;
proc sql;
create table count as
select distinct country, count(distinct year) as c_year
from x
group by country;
quit;
* Transpose data to get all values in single column;
proc sort data=x;
by country year;
run;
proc transpose data=x out=y name=varnm;
by country year;
var var:;
run;
*Merge with count country/year combo information;
proc sql;
create table y2 as
select a.*, b.c_year
from y as a left join count as b
on a.country = b.country;
*Perform final check and count to get per variable available combo;
create table z as
select distinct a.varnm, sum(a.mark) as final_count
from (select distinct varnm, country, count(col1) as count, c_year, case when calculated count = c_year then 1 else 0 end as mark
from y2
group by varnm, country) AS a
group by varnm
order by final_count descending;
quit;
Thanks
This works PERFECTLY, and the output is exactly what I needed - thank you so much for your help!! I really appreciate it!!
One last quick question - when I try to use this code on my actual data, the "var var:;" does not work b/c my variable names are all over the place (and not numbered simply). I tried to use "var _ALL_;" instead, but that did not seem to work. Any suggestions? So sorry and thanks again!
*Edit - nevermind, sorted it out with "_numeric_" instead of "_all_" - works great, thanks!
You can resolve this issue by keeping only the necessary variables in the dataset when transposing. In this case, you only need to keep all your targeted variables (i.e var1 var2 var3 var1130 and so on) along with country and year. So you can use same code to transpose all variables without using all names just by removing the var statement from the proc transpose block.
* Assuming that your_data have only necessary variables along with country and year;
proc transpose data=your_data out=y name=varnm;
by country year;
run;
data x; infile datalines dsd dlm=' '; input country : $9. year var1 var2 var3 var1330; datalines; Andora 1990 . 5 . 10 Andora 2010 3 7 . . Aruba 1990 4 8 6 9 Aruba 2010 . 4 5 . Argentina 1990 . 5 . 14 Argentina 2010 3 3 . . ; run; options symbolgen; proc sql noprint; select 'n('||strip(name)||') as '||strip(name) into : list separated by ',' from dictionary.columns where libname='WORK' and memname='X' and upcase(name) like 'VAR%'; create table temp as select &list from x group by country ; select count(distinct year) into : n separated by ' ' from x; select "sum("||strip(name)||"=&n) as "||strip(name) into : list1 separated by ',' from dictionary.columns where libname='WORK' and memname='X' and upcase(name) like 'VAR%'; create table want as select &list1 from temp; quit;
Xia Keshan
Message was edited by: xia keshan
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.