BookmarkSubscribeRSS Feed
msf2021
Fluorite | Level 6

I am new to SAS and would like any help for the following:

I have one dataset "table1":

 

data table1;
  input id $ yearmonth:yymmn6. var1 $ var2 var3 Target;
  format yearmonth yymmn6.;
datalines;
A 202002 xxx 1 2 1
B 202004 zz 2 23 1
C 202011 ee 4 4 1
D 202011 ff 5 5 1
E 202103 tt 6 6 1
F 202103 aa 9 89 1
;

And other dataset "table2":

 

data table2;
  input id $ yearmonth:yymmn6. var1 $ var2 var3 Target;
  format yearmonth yymmn6.;
datalines;
E 202002 aac 23 13 0
F 202004 jsk 4 14 0
A 202011 skal 5 18 0
F 202011 aks 9 13 0
B 202103 xka 1 7 0
G 202103 als 12 9 0
;

And I would like to get a stratified sample from table2 by yearmonth ("Want" table) but with following restrictions:

  • I cannot have same id in table2 that i have in table1
  • I cannot have duplicated id's in want table from table 2 after stratified sample (in table 2 they are duplicated because they can be in two different months)
  • I would like to choose the dimension of the stratified sample (n) to achieve the following:
    • appending the dataset1 and dataset2 the count of id of Target 0 data (coming from dataset2) should be 80% of the total data

The output would then be :

id  yearmonth   Var1    Var2    Var3    Target
A   202002      xxx     1        2       1
B   202004      zz      2       23       1
C   202011      ee      4        4       1
D   202011      ff      5        5       1
E   202103      tt      6        6       1
F   202103      aa      9       89       1
G   202103      als     12       9       0

Any help would be very much appreciated!

Thanks!!

4 REPLIES 4
Oligolas
Barite | Level 11

Hi,

you get the output with this:

PROC SQL;
   CREATE TABLE want AS
      SELECT *
      FROM table1 
      UNION CORRESPONDING 
      (SELECT * 
       FROM table2 
       WHERE ID not in
         (SELECT ID 
          FROM table1)
      )
   ;
QUIT;

I do not understand what you mean with:

@msf2021 wrote:

I would like to choose the dimension of the stratified sample (n) to achieve the following:

appending the dataset1 and dataset2 the count of id of Target 0 data (coming from dataset2) should be 80% of the total data

________________________

- Cheers -

msf2021
Fluorite | Level 6
Hi! In that way i am not having a stratified sample. 😞
What i mean is i want to get a new dataset "want" that basically is the appendix of dataset1 and the sample (stratified) from dataset2. And i want that the rows in the appendix that come from dataset2 represent 20% of total rows. I hope that i was clear enough, if not, let me know 🙂
Oligolas
Barite | Level 11

Hi,

nope it's not clear to me. Try to post a most explicit example of what you would like to achieve and how you would select the rows to display if the rows from dataset2 exceed the 20% of total rows.

________________________

- Cheers -

ballardw
Super User

Are you going to add selection probabilities or weights to this sample?

Ready to join fellow brilliant minds for the SAS Hackathon?

Build your skills. Make connections. Enjoy creative freedom. Maybe change the world. Registration is now open through August 30th. Visit the SAS Hackathon homepage.

Register today!
How to Concatenate Values

Learn how use the CAT functions in SAS to join values from multiple variables into a single value.

Find more tutorials on the SAS Users YouTube channel.

Click image to register for webinarClick image to register for webinar

Classroom Training Available!

Select SAS Training centers are offering in-person courses. View upcoming courses for:

View all other training opportunities.

Discussion stats
  • 4 replies
  • 526 views
  • 0 likes
  • 3 in conversation