Help using Base SAS procedures

Finding "non-missing pairs" of data

Accepted Solution Solved
Reply
Occasional Contributor
Posts: 10
Accepted Solution

Finding "non-missing pairs" of data

I have a stacked dataset of countries with two years of data per/country and a year indicator. The format is correct (I need long, not wide); however, I have 1300+ variables and TONS of missing data. I need to figure out how many PAIRS of complete data I have for each country from both time periods (ideally with variables ranked from most  complete pairs to least). Normally I would do some sort of dummy coding/sum, but given the paired component here, I can't figure out how that would work. If anyone can offer suggestions, I would be very grateful... thanks in advance!


Accepted Solutions
Solution
‎05-15-2014 12:29 PM
New Contributor
Posts: 2

Re: Finding "non-missing pairs" of data

Hi,

You can transpose your data in single column with country/year combo and then check and count non-missing pairs. Hope the following code will help what you are seeking for.

Data table Z will contain all the variable names along with non-missing country/year combo count.

data x;

  infile datalines dsd dlm=' ';

  input country : $9. year var1 var2 var3 var1330;

  datalines;

Andora 1990 . 5 . 10

Andora 2010 3 7 . .

Aruba 1990 4 8 6 9

Aruba 2010 . 4 5 .

Argentina 1990 . 5 . 14

Argentina 2010 3 3 . .

;

run;

*count country year combo to check at later stage;

proc sql;

  create table count as

  select distinct country, count(distinct year) as c_year

  from x

  group by country;

quit;

* Transpose data to get all values in single column;

proc sort data=x;

  by country year;

run;

proc transpose data=x out=y name=varnm;

  by country year;

  var var:;

run;

*Merge with count country/year combo information;

proc sql;

  create table y2 as

  select a.*, b.c_year

  from y as a left join count as b

  on a.country = b.country;

*Perform final check and count to get per variable available combo;

  create table z as

  select distinct a.varnm, sum(a.mark) as final_count

  from (select distinct varnm, country, count(col1) as count, c_year, case when calculated count = c_year then 1 else 0 end as mark

            from y2

            group by varnm, country) AS a

  group by varnm

  order by final_count descending;

quit;

Thanks

View solution in original post


All Replies
Super User
Super User
Posts: 7,421

Re: Finding "non-missing pairs" of data

Please post some data and required output.

Occasional Contributor
Posts: 10

Re: Finding "non-missing pairs" of data

Here is a simplified verison, the current data setup looks like this:

Country          Year     Var1     Var2     Var3...          Var 1330

Andora          1990     .               5          .                    10

Andora          2010     3              7         .                    .

Aruba            1990     4              8          6                   9

Aruba            2010     .               4          5                    .

Argentina     1990     .                5          .                    14

Argentina     2010     3              3          .                    .

Ideally I want to know that number of non-missing pairs of data by country/year combo, ranked:

Var 2: 3 (all countries/years)

Var3: 1 (only 1 country/year pair complete - Aruba)

Var1 and Var1330: 0 (no complete pairs)

Does that help? Thanks again.

Super User
Super User
Posts: 7,421

Re: Finding "non-missing pairs" of data

Hi,

There was a post today, almost the same thing: https://communities.sas.com/thread/57555

Check that out.  You would want to go the route of generating the code as you have so many variables, maybe using a datastep and call execute, or macro code.

Occasional Contributor
Posts: 10

Re: Finding "non-missing pairs" of data

Thanks again for your response. I saw that post earlier, but they have a wide dataset (vs. the long format I have). I thought it a proc sql create table command might work, but I couldn't figure the coding out...

Solution
‎05-15-2014 12:29 PM
New Contributor
Posts: 2

Re: Finding "non-missing pairs" of data

Hi,

You can transpose your data in single column with country/year combo and then check and count non-missing pairs. Hope the following code will help what you are seeking for.

Data table Z will contain all the variable names along with non-missing country/year combo count.

data x;

  infile datalines dsd dlm=' ';

  input country : $9. year var1 var2 var3 var1330;

  datalines;

Andora 1990 . 5 . 10

Andora 2010 3 7 . .

Aruba 1990 4 8 6 9

Aruba 2010 . 4 5 .

Argentina 1990 . 5 . 14

Argentina 2010 3 3 . .

;

run;

*count country year combo to check at later stage;

proc sql;

  create table count as

  select distinct country, count(distinct year) as c_year

  from x

  group by country;

quit;

* Transpose data to get all values in single column;

proc sort data=x;

  by country year;

run;

proc transpose data=x out=y name=varnm;

  by country year;

  var var:;

run;

*Merge with count country/year combo information;

proc sql;

  create table y2 as

  select a.*, b.c_year

  from y as a left join count as b

  on a.country = b.country;

*Perform final check and count to get per variable available combo;

  create table z as

  select distinct a.varnm, sum(a.mark) as final_count

  from (select distinct varnm, country, count(col1) as count, c_year, case when calculated count = c_year then 1 else 0 end as mark

            from y2

            group by varnm, country) AS a

  group by varnm

  order by final_count descending;

quit;

Thanks

Occasional Contributor
Posts: 10

Re: Finding "non-missing pairs" of data

This works PERFECTLY, and the output is exactly what I needed - thank you so much for your help!! I really appreciate it!!

Occasional Contributor
Posts: 10

Re: Finding "non-missing pairs" of data

One last quick question - when I try to use this code on my actual data, the "var var:;" does not work b/c my variable names are all over the place (and not numbered simply). I tried to use "var _ALL_;" instead, but that did not seem to work. Any suggestions? So sorry and thanks again!

*Edit - nevermind, sorted it out with "_numeric_" instead of "_all_" - works great, thanks!

New Contributor
Posts: 2

Re: Finding "non-missing pairs" of data

You can resolve this issue by keeping only the necessary variables in the dataset when transposing. In this case, you only need to keep all your targeted variables (i.e var1 var2 var3 var1130 and so on) along with country and year. So you can use same code to transpose all variables without using all names just by removing the var statement from the proc transpose block.  

* Assuming that your_data have only necessary variables along with country and year;

proc transpose data=your_data out=y name=varnm;

  by country year;

run;

Super User
Posts: 9,688

Re: Finding "non-missing pairs" of data

data x;
  infile datalines dsd dlm=' ';
  input country : $9. year var1 var2 var3 var1330;
  datalines;
Andora 1990 . 5 . 10
Andora 2010 3 7 . .
Aruba 1990 4 8 6 9
Aruba 2010 . 4 5 .
Argentina 1990 . 5 . 14
Argentina 2010 3 3 . .
;
run;
  options symbolgen;
proc sql noprint;
select 'n('||strip(name)||') as '||strip(name) into : list separated by ','
 from dictionary.columns
  where libname='WORK' and memname='X' and upcase(name) like 'VAR%';
 create table temp as 
  select &list
   from x
    group by country ;

select count(distinct year) into : n separated by ' ' from x;
select "sum("||strip(name)||"=&n) as "||strip(name) into : list1 separated by ','
 from dictionary.columns
  where libname='WORK' and memname='X' and upcase(name) like 'VAR%';
create table want as
 select &list1 from temp;
quit;

Xia Keshan

Message was edited by: xia keshan

☑ This topic is solved.

Need further help from the community? Please ask a new question.

Discussion stats
  • 9 replies
  • 422 views
  • 3 likes
  • 4 in conversation