Solved: Re: Finding "non-missing pairs" of data

oncearunner · Posted 05-15-2014 10:24 AM

I have a stacked dataset of countries with two years of data per/country and a year indicator. The format is correct (I need long, not wide); however, I have 1300+ variables and TONS of missing data. I need to figure out how many PAIRS of complete data I have for each country from both time periods (ideally with variables ranked from most complete pairs to least). Normally I would do some sort of dummy coding/sum, but given the paired component here, I can't figure out how that would work. If anyone can offer suggestions, I would be very grateful... thanks in advance!

rajwanur · Posted 05-15-2014 12:29 PM

Hi,

You can transpose your data in single column with country/year combo and then check and count non-missing pairs. Hope the following code will help what you are seeking for.

Data table Z will contain all the variable names along with non-missing country/year combo count.

data x;

infile datalines dsd dlm=' ';

input country : $9. year var1 var2 var3 var1330;

datalines;

Andora 1990 . 5 . 10

Andora 2010 3 7 . .

Aruba 1990 4 8 6 9

Aruba 2010 . 4 5 .

Argentina 1990 . 5 . 14

Argentina 2010 3 3 . .

;

run;

*count country year combo to check at later stage;

proc sql;

create table count as

select distinct country, count(distinct year) as c_year

from x

group by country;

quit;

* Transpose data to get all values in single column;

proc sort data=x;

by country year;

run;

proc transpose data=x out=y name=varnm;

by country year;

var var:;

run;

*Merge with count country/year combo information;

proc sql;

create table y2 as

select a.*, b.c_year

from y as a left join count as b

on a.country = b.country;

*Perform final check and count to get per variable available combo;

create table z as

select distinct a.varnm, sum(a.mark) as final_count

from (select distinct varnm, country, count(col1) as count, c_year, case when calculated count = c_year then 1 else 0 end as mark

from y2

group by varnm, country) AS a

group by varnm

order by final_count descending;

quit;

Thanks

View solution in original post

RW9 · Posted 05-15-2014 10:32 AM

Please post some data and required output.

oncearunner · Posted 05-15-2014 10:44 AM

Here is a simplified verison, the current data setup looks like this:

Country Year Var1 Var2 Var3... Var 1330

Andora 1990 . 5 . 10

Andora 2010 3 7 . .

Aruba 1990 4 8 6 9

Aruba 2010 . 4 5 .

Argentina 1990 . 5 . 14

Argentina 2010 3 3 . .

Ideally I want to know that number of non-missing pairs of data by country/year combo, ranked:

Var 2: 3 (all countries/years)

Var3: 1 (only 1 country/year pair complete - Aruba)

Var1 and Var1330: 0 (no complete pairs)

Does that help? Thanks again.

RW9 · Posted 05-15-2014 11:19 AM

Hi,

There was a post today, almost the same thing: https://communities.sas.com/thread/57555

Check that out. You would want to go the route of generating the code as you have so many variables, maybe using a datastep and call execute, or macro code.

oncearunner · Posted 05-15-2014 11:32 AM

Thanks again for your response. I saw that post earlier, but they have a wide dataset (vs. the long format I have). I thought it a proc sql create table command might work, but I couldn't figure the coding out...

rajwanur · Posted 05-15-2014 12:29 PM

Hi,

You can transpose your data in single column with country/year combo and then check and count non-missing pairs. Hope the following code will help what you are seeking for.

Data table Z will contain all the variable names along with non-missing country/year combo count.

data x;

infile datalines dsd dlm=' ';

input country : $9. year var1 var2 var3 var1330;

datalines;

Andora 1990 . 5 . 10

Andora 2010 3 7 . .

Aruba 1990 4 8 6 9

Aruba 2010 . 4 5 .

Argentina 1990 . 5 . 14

Argentina 2010 3 3 . .

;

run;

*count country year combo to check at later stage;

proc sql;

create table count as

select distinct country, count(distinct year) as c_year

from x

group by country;

quit;

* Transpose data to get all values in single column;

proc sort data=x;

by country year;

run;

proc transpose data=x out=y name=varnm;

by country year;

var var:;

run;

*Merge with count country/year combo information;

proc sql;

create table y2 as

select a.*, b.c_year

from y as a left join count as b

on a.country = b.country;

*Perform final check and count to get per variable available combo;

create table z as

select distinct a.varnm, sum(a.mark) as final_count

from (select distinct varnm, country, count(col1) as count, c_year, case when calculated count = c_year then 1 else 0 end as mark

from y2

group by varnm, country) AS a

group by varnm

order by final_count descending;

quit;

Thanks

oncearunner · Posted 05-15-2014 02:33 PM

This works PERFECTLY, and the output is exactly what I needed - thank you so much for your help!! I really appreciate it!!

oncearunner · Posted 05-15-2014 02:54 PM

One last quick question - when I try to use this code on my actual data, the "var var:;" does not work b/c my variable names are all over the place (and not numbered simply). I tried to use "var _ALL_;" instead, but that did not seem to work. Any suggestions? So sorry and thanks again!

*Edit - nevermind, sorted it out with "_numeric_" instead of "_all_" - works great, thanks!

rajwanur · Posted 05-15-2014 03:19 PM

You can resolve this issue by keeping only the necessary variables in the dataset when transposing. In this case, you only need to keep all your targeted variables (i.e var1 var2 var3 var1130 and so on) along with country and year. So you can use same code to transpose all variables without using all names just by removing the var statement from the proc transpose block.

* Assuming that your_data have only necessary variables along with country and year;

proc transpose data=your_data out=y name=varnm;

by country year;

run;

Ksharp · Posted 05-16-2014 10:05 AM

data x;
  infile datalines dsd dlm=' ';
  input country : $9. year var1 var2 var3 var1330;
  datalines;
Andora 1990 . 5 . 10
Andora 2010 3 7 . .
Aruba 1990 4 8 6 9
Aruba 2010 . 4 5 .
Argentina 1990 . 5 . 14
Argentina 2010 3 3 . .
;
run;
  options symbolgen;
proc sql noprint;
select 'n('||strip(name)||') as '||strip(name) into : list separated by ','
 from dictionary.columns
  where libname='WORK' and memname='X' and upcase(name) like 'VAR%';
 create table temp as 
  select &list
   from x
    group by country ;

select count(distinct year) into : n separated by ' ' from x;
select "sum("||strip(name)||"=&n) as "||strip(name) into : list1 separated by ','
 from dictionary.columns
  where libname='WORK' and memname='X' and upcase(name) like 'VAR%';
create table want as
 select &list1 from temp;
quit;

Xia Keshan

Message was edited by: xia keshan

Catch up on SAS Innovate 2026

SAS Training: Just a Click Away