BookmarkSubscribeRSS Feed
🔒 This topic is solved and locked. Need further help from the community? Please sign in and ask a new question.
oncearunner
Calcite | Level 5

I have a stacked dataset of countries with two years of data per/country and a year indicator. The format is correct (I need long, not wide); however, I have 1300+ variables and TONS of missing data. I need to figure out how many PAIRS of complete data I have for each country from both time periods (ideally with variables ranked from most  complete pairs to least). Normally I would do some sort of dummy coding/sum, but given the paired component here, I can't figure out how that would work. If anyone can offer suggestions, I would be very grateful... thanks in advance!

1 ACCEPTED SOLUTION

Accepted Solutions
rajwanur
Calcite | Level 5

Hi,

You can transpose your data in single column with country/year combo and then check and count non-missing pairs. Hope the following code will help what you are seeking for.

Data table Z will contain all the variable names along with non-missing country/year combo count.

data x;

  infile datalines dsd dlm=' ';

  input country : $9. year var1 var2 var3 var1330;

  datalines;

Andora 1990 . 5 . 10

Andora 2010 3 7 . .

Aruba 1990 4 8 6 9

Aruba 2010 . 4 5 .

Argentina 1990 . 5 . 14

Argentina 2010 3 3 . .

;

run;

*count country year combo to check at later stage;

proc sql;

  create table count as

  select distinct country, count(distinct year) as c_year

  from x

  group by country;

quit;

* Transpose data to get all values in single column;

proc sort data=x;

  by country year;

run;

proc transpose data=x out=y name=varnm;

  by country year;

  var var:;

run;

*Merge with count country/year combo information;

proc sql;

  create table y2 as

  select a.*, b.c_year

  from y as a left join count as b

  on a.country = b.country;

*Perform final check and count to get per variable available combo;

  create table z as

  select distinct a.varnm, sum(a.mark) as final_count

  from (select distinct varnm, country, count(col1) as count, c_year, case when calculated count = c_year then 1 else 0 end as mark

            from y2

            group by varnm, country) AS a

  group by varnm

  order by final_count descending;

quit;

Thanks

View solution in original post

9 REPLIES 9
RW9
Diamond | Level 26 RW9
Diamond | Level 26

Please post some data and required output.

oncearunner
Calcite | Level 5

Here is a simplified verison, the current data setup looks like this:

Country          Year     Var1     Var2     Var3...          Var 1330

Andora          1990     .               5          .                    10

Andora          2010     3              7         .                    .

Aruba            1990     4              8          6                   9

Aruba            2010     .               4          5                    .

Argentina     1990     .                5          .                    14

Argentina     2010     3              3          .                    .

Ideally I want to know that number of non-missing pairs of data by country/year combo, ranked:

Var 2: 3 (all countries/years)

Var3: 1 (only 1 country/year pair complete - Aruba)

Var1 and Var1330: 0 (no complete pairs)

Does that help? Thanks again.

RW9
Diamond | Level 26 RW9
Diamond | Level 26

Hi,

There was a post today, almost the same thing: https://communities.sas.com/thread/57555

Check that out.  You would want to go the route of generating the code as you have so many variables, maybe using a datastep and call execute, or macro code.

oncearunner
Calcite | Level 5

Thanks again for your response. I saw that post earlier, but they have a wide dataset (vs. the long format I have). I thought it a proc sql create table command might work, but I couldn't figure the coding out...

rajwanur
Calcite | Level 5

Hi,

You can transpose your data in single column with country/year combo and then check and count non-missing pairs. Hope the following code will help what you are seeking for.

Data table Z will contain all the variable names along with non-missing country/year combo count.

data x;

  infile datalines dsd dlm=' ';

  input country : $9. year var1 var2 var3 var1330;

  datalines;

Andora 1990 . 5 . 10

Andora 2010 3 7 . .

Aruba 1990 4 8 6 9

Aruba 2010 . 4 5 .

Argentina 1990 . 5 . 14

Argentina 2010 3 3 . .

;

run;

*count country year combo to check at later stage;

proc sql;

  create table count as

  select distinct country, count(distinct year) as c_year

  from x

  group by country;

quit;

* Transpose data to get all values in single column;

proc sort data=x;

  by country year;

run;

proc transpose data=x out=y name=varnm;

  by country year;

  var var:;

run;

*Merge with count country/year combo information;

proc sql;

  create table y2 as

  select a.*, b.c_year

  from y as a left join count as b

  on a.country = b.country;

*Perform final check and count to get per variable available combo;

  create table z as

  select distinct a.varnm, sum(a.mark) as final_count

  from (select distinct varnm, country, count(col1) as count, c_year, case when calculated count = c_year then 1 else 0 end as mark

            from y2

            group by varnm, country) AS a

  group by varnm

  order by final_count descending;

quit;

Thanks

oncearunner
Calcite | Level 5

This works PERFECTLY, and the output is exactly what I needed - thank you so much for your help!! I really appreciate it!!

oncearunner
Calcite | Level 5

One last quick question - when I try to use this code on my actual data, the "var var:;" does not work b/c my variable names are all over the place (and not numbered simply). I tried to use "var _ALL_;" instead, but that did not seem to work. Any suggestions? So sorry and thanks again!

*Edit - nevermind, sorted it out with "_numeric_" instead of "_all_" - works great, thanks!

rajwanur
Calcite | Level 5

You can resolve this issue by keeping only the necessary variables in the dataset when transposing. In this case, you only need to keep all your targeted variables (i.e var1 var2 var3 var1130 and so on) along with country and year. So you can use same code to transpose all variables without using all names just by removing the var statement from the proc transpose block.  

* Assuming that your_data have only necessary variables along with country and year;

proc transpose data=your_data out=y name=varnm;

  by country year;

run;

Ksharp
Super User
data x;
  infile datalines dsd dlm=' ';
  input country : $9. year var1 var2 var3 var1330;
  datalines;
Andora 1990 . 5 . 10
Andora 2010 3 7 . .
Aruba 1990 4 8 6 9
Aruba 2010 . 4 5 .
Argentina 1990 . 5 . 14
Argentina 2010 3 3 . .
;
run;
  options symbolgen;
proc sql noprint;
select 'n('||strip(name)||') as '||strip(name) into : list separated by ','
 from dictionary.columns
  where libname='WORK' and memname='X' and upcase(name) like 'VAR%';
 create table temp as 
  select &list
   from x
    group by country ;

select count(distinct year) into : n separated by ' ' from x;
select "sum("||strip(name)||"=&n) as "||strip(name) into : list1 separated by ','
 from dictionary.columns
  where libname='WORK' and memname='X' and upcase(name) like 'VAR%';
create table want as
 select &list1 from temp;
quit;

Xia Keshan

Message was edited by: xia keshan

sas-innovate-2024.png

Join us for SAS Innovate April 16-19 at the Aria in Las Vegas. Bring the team and save big with our group pricing for a limited time only.

Pre-conference courses and tutorials are filling up fast and are always a sellout. Register today to reserve your seat.

 

Register now!

What is Bayesian Analysis?

Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 9 replies
  • 1142 views
  • 3 likes
  • 4 in conversation