I am new to SAS and would like any help for the following:
I have one dataset "table1":
data table1; input id $ yearmonth:yymmn6. var1 $ var2 var3 Target; format yearmonth yymmn6.; datalines; A 202002 xxx 1 2 1 B 202004 zz 2 23 1 C 202011 ee 4 4 1 D 202011 ff 5 5 1 E 202103 tt 6 6 1 F 202103 aa 9 89 1 ;
And other dataset "table2":
data table2; input id $ yearmonth:yymmn6. var1 $ var2 var3 Target; format yearmonth yymmn6.; datalines; E 202002 aac 23 13 0 F 202004 jsk 4 14 0 A 202011 skal 5 18 0 F 202011 aks 9 13 0 B 202103 xka 1 7 0 G 202103 als 12 9 0 ;
And I would like to get a stratified sample from table2 by yearmonth ("Want" table) but with following restrictions:
The output would then be :
id yearmonth Var1 Var2 Var3 Target A 202002 xxx 1 2 1 B 202004 zz 2 23 1 C 202011 ee 4 4 1 D 202011 ff 5 5 1 E 202103 tt 6 6 1 F 202103 aa 9 89 1 G 202103 als 12 9 0
Any help would be very much appreciated!
Thanks!!
Hi,
you get the output with this:
PROC SQL;
CREATE TABLE want AS
SELECT *
FROM table1
UNION CORRESPONDING
(SELECT *
FROM table2
WHERE ID not in
(SELECT ID
FROM table1)
)
;
QUIT;
I do not understand what you mean with:@msf2021 wrote:
I would like to choose the dimension of the stratified sample (n) to achieve the following:
appending the dataset1 and dataset2 the count of id of Target 0 data (coming from dataset2) should be 80% of the total data
- Cheers -
Hi,
nope it's not clear to me. Try to post a most explicit example of what you would like to achieve and how you would select the rows to display if the rows from dataset2 exceed the 20% of total rows.
- Cheers -
Are you going to add selection probabilities or weights to this sample?
SAS Innovate 2025 is scheduled for May 6-9 in Orlando, FL. Sign up to be first to learn about the agenda and registration!
Learn how use the CAT functions in SAS to join values from multiple variables into a single value.
Find more tutorials on the SAS Users YouTube channel.
Ready to level-up your skills? Choose your own adventure.