My problem is to combine two cross sectionl data sets into a pandel data set. I make an example to express my problem:
Data set 1:
city1 city2 pass
chi det 3
chi ks 4
la dc 5
la chi 6
Data set 2:
city1 city2 pass
chi det 13
chi ks 14
la dc 15
la chi 16
la ho 20
I use the followng sas code to combine them:
data total;
set data1 data1;
run;
As you know, I want to get a panel data set with the panel of the same city1 and city2; the last obervation in data set 2 is singled and I want to delete it. In my real problem, before I combine them together, I do not know which observations are singled. So the goal to 'set' them together and delete the singled observations; and then create the panel ID. This is similar to get the intersection of two sets. Here each set is composed of the city1 and city from one data set.
Thanks a lot!
Might be easier to MERGE the datasets.
data want ;
merge set1 (in=in1 rename=(pass=pass1))
set2 (in=in2 rename=(pass=pass2))
;
by city1 city2;
if in1 and in2;
run;
If you really want to keep the structure vertical then here is one way.
data want ;
any1=0; any2=0;
do until (last.city2);
set set1(in=in1) set2(in=in2);
by city1 city2;
if in1 then any1=1;
if in2 then any2=1;
end;
length source dsname $41;
do until (last.city2);
set set1 set2 indsname=dsname;
by city1 city2;
source=dsname;
if any1 and any2 then output;
end;
run;
Tom, thank you very much.
When I run the second way, I get the following error message:
ERROR: BY variables are not properly sorted on data set WORK.SET1.
The code I use:
Data set1;
infile datalines delimiter=',';
input city1 $ city2 $ pass;
datalines;
chi , det , 3
chi , ks , 4
la , dc , 5
la , chi , 6
;
run;
Data set2;
infile datalines delimiter=',';
input city1 $ city2 $ pass;
datalines;
chi , det , 13
chi , ks , 14
la , dc , 15
la , chi , 16
la , ho , 20
;
run;
data want ;
any1=0; any2=0;
do until (last.city2);
set set1(in=in1) set2(in=in2);
by city1 city2;
if in1 then any1=1;
if in2 then any2=1;
end;
length source dsname $41;
do until (last.city2);
set set1 set2 indsname=dsname;
by city1 city2;
source=dsname;
if any1 and any2 then output;
end;
run;
Thanks!
and I get the same error message when I run the first way.
Thanks!
I figure out how to fix the error-proc sort.
But could you say something about how to creat the panel ID for both ways?
Thanks a lot!
How About:
Data set1; infile datalines delimiter=','; input city1 $ city2 $ pass; datalines; chi , det , 3 chi , ks , 4 la , dc , 5 la , chi , 6 ; run; Data set2; infile datalines delimiter=','; input city1 $ city2 $ pass; datalines; chi , det , 13 chi , ks , 14 la , dc , 15 la , chi , 16 la , ho , 20 ; run; proc sql; create table want as select * from set1 where cats(city1,city2) in (select cats(city1,city2) from set2) union all select * from set2 where cats(city1,city2) in (select cats(city1,city2) from set1) ; quit;
Ksharp
Are you ready for the spotlight? We're accepting content ideas for SAS Innovate 2025 to be held May 6-9 in Orlando, FL. The call is open until September 25. Read more here about why you should contribute and what is in it for you!
Learn the difference between classical and Bayesian statistical approaches and see a few PROC examples to perform Bayesian analysis in this video.
Find more tutorials on the SAS Users YouTube channel.