Re: Panel data-how to creat panel ID and delete ungrouped observations...

sos · Posted 10-31-2011 07:28 PM

My problem is to combine two cross sectionl data sets into a pandel data set. I make an example to express my problem:

Data set 1:

city1 city2 pass

chi det 3

chi ks 4

la dc 5

la chi 6

Data set 2:

city1 city2 pass

chi det 13

chi ks 14

la dc 15

la chi 16

la ho 20

I use the followng sas code to combine them:

data total;

set data1 data1;

run;

As you know, I want to get a panel data set with the panel of the same city1 and city2; the last obervation in data set 2 is singled and I want to delete it. In my real problem, before I combine them together, I do not know which observations are singled. So the goal to 'set' them together and delete the singled observations; and then create the panel ID. This is similar to get the intersection of two sets. Here each set is composed of the city1 and city from one data set.

Thanks a lot!

Tom · Posted 10-31-2011 07:48 PM

Might be easier to MERGE the datasets.

data want ;
  merge set1 (in=in1 rename=(pass=pass1))
        set2 (in=in2 rename=(pass=pass2))
  ;
  by city1 city2;
  if in1 and in2;
run;

If you really want to keep the structure vertical then here is one way.

data want ;
  any1=0; any2=0;
  do until (last.city2);
     set set1(in=in1) set2(in=in2);
     by city1 city2;
     if in1 then any1=1;
     if in2 then any2=1;
  end;
  length source dsname $41;
  do until (last.city2);
     set set1 set2 indsname=dsname;
     by city1 city2;
     source=dsname;
     if any1 and any2 then output;
  end;
run;

sos · Posted 10-31-2011 09:40 PM

Tom, thank you very much.

When I run the second way, I get the following error message:

ERROR: BY variables are not properly sorted on data set WORK.SET1.

The code I use:

Data set1;

infile datalines delimiter=',';

input city1 $ city2 $ pass;

datalines;

chi , det , 3

chi , ks , 4

la , dc , 5

la , chi , 6

;

run;

Data set2;

infile datalines delimiter=',';

input city1 $ city2 $ pass;

datalines;

chi , det , 13

chi , ks , 14

la , dc , 15

la , chi , 16

la , ho , 20

;

run;

data want ;

any1=0; any2=0;

do until (last.city2);

set set1(in=in1) set2(in=in2);

by city1 city2;

if in1 then any1=1;

if in2 then any2=1;

end;

length source dsname $41;

do until (last.city2);

set set1 set2 indsname=dsname;

by city1 city2;

source=dsname;

if any1 and any2 then output;

end;

run;

Thanks!

sos · Posted 10-31-2011 09:42 PM

and I get the same error message when I run the first way.

Thanks!

sos · Posted 10-31-2011 09:47 PM

I figure out how to fix the error-proc sort.

But could you say something about how to creat the panel ID for both ways?

Thanks a lot!

Ksharp · Posted 11-01-2011 02:12 AM

How About:

Data set1;
infile datalines delimiter=',';
input city1 $ city2 $ pass;
datalines;          
chi ,   det ,   3
chi ,   ks  ,   4
la  ,   dc  ,   5
la   ,  chi ,   6
;
run;

 

Data set2;
infile datalines delimiter=',';
input city1 $ city2 $ pass;
datalines;          
chi ,   det ,   13
chi ,   ks  ,   14
la  ,   dc  ,   15
la   ,  chi ,   16
la   ,  ho ,   20
;
run;
proc sql;
 create table want as
  select * from set1 
   where cats(city1,city2) in (select cats(city1,city2) from set2)
  union all  
  select * from set2
   where cats(city1,city2) in (select cats(city1,city2) from set1)
;
quit;

Ksharp

Panel data-how to creat panel ID and delete ungrouped observations.