I have problem to merge something in SAS. I have this dataset in SAS:
data ami_aggregert; set ami_aggr; tjenester _NPR = tjenestebruk_Sum; label lopenummer uke tjeneste_NPR index; run; * then i sort; proc sort data=ami_aggregert; by lopenummer index uke; run; proc sort data=ami_weeks_ind; by lopenummer index uke; run; data ami_weeks_full_NPR; merge ami_aggregert (in=a) ami_weeks_ind (in=b) ; if a and b; by lopenummer index uke; run;
The first dataset ami_aggregert contains 11 rows and if I merge it with 52 weeks of within each index, then I got 16 rows. I don't understand that?
Must i sort it on another way?
The first dataset ami_aggregert contains 11 rows and if I merge it with 52 weeks of within each index, then I got 16 rows. I don't understand that?
Since you are restricting the results to observations that have contributions from both inputs you should only get 11 observations if the key variables uniquely identify observations in both datasets. The fact that instead you got 16 means that the extra 5 observations represent those that have repeated observations in one (or both) of the two input datasets.
What are they key variables that uniquely identify the observations in the first dataset?
What are the key variables that uniquely identify the observations in the second dataset?
What are the variables that you want to combine on? What do you want to do if there are repeated observations for those variables?
Questions like these are always easier if we can see/use some sample data.
Is ami_weeks_ind uniqye by lopenummer, index, uke ?
@user40 To minimize the guesswork for us and also to allow us to spend our time on answering your question with tested code what would help is:
1. Post sample data via a fully working and tested SAS data step (the have data)
2. Post the desired result (either another SAS data step creating the want data or then in some other way)
3. Describe the logic to get from have to want
By sharing the want data based on the have data it's much easier for us to interpret your description of the logic as it's often really hard to formulate such logic unambiguously.
Also: It requires a bit more prep work on your end but if you provide such information upfront you will get most of the time rather quickly the answer you need.
The first dataset ami_aggregert contains 11 rows and if I merge it with 52 weeks of within each index, then I got 16 rows. I don't understand that?
Since you are restricting the results to observations that have contributions from both inputs you should only get 11 observations if the key variables uniquely identify observations in both datasets. The fact that instead you got 16 means that the extra 5 observations represent those that have repeated observations in one (or both) of the two input datasets.
What are they key variables that uniquely identify the observations in the first dataset?
What are the key variables that uniquely identify the observations in the second dataset?
What are the variables that you want to combine on? What do you want to do if there are repeated observations for those variables?
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.