About MichaelWku

MichaelWku · ‎08-03-2015

I want to substitute missing values for price with a random selection from the non-missing same product prices within the same date range. For example: replace the missing price for product 2 in data line 2 with a random selection from 120, 130, or 80. replace the missing price for product 1 in data line 3 with a random selection from 100, 110, or 90. proc mi doesn’t seem to have such an option. data one; input subject date product price; datalines; 1 1 1 100 1 1 2 . 2 1 1 . 2 1 2 120 3 1 1 110 3 1 2 . 4 1 1 . 4 1 2 130 5 1 1 90 5 1 2 . 6 1 1 . 6 1 2 80 7 2 1 . 7 2 2 140 8 2 1 120 8 2 2 . 9 2 1 . 9 2 2 100 10 2 1 . 10 2 2 70 11 2 1 80 11 2 2 . 12 2 1 150 12 2 2 . ; My actual dataset has 1.3 million observations with 25 products and 6 date ranges. Each subject observed only one price with approx. 20 missing prices for the other products. Not all subjects were exposed to all 25 products. The number of products each subject saw wasn’t fixed, some saw more, some saw less. The actual data has about 65,000 observed prices with the rest missing.

MichaelWku · ‎07-31-2015

The first dataset (one) represents people (pid) and where they live (cid). The second data set (two) has where they live (cid) and a set of possible product choices (pho). I need a dataset that has, for each person, all the choices they face. Note: the choice set changes based on where they live. The below illustrates my issue. The actual data has 90,000 persons living in 180 different locations that on average will have 20 choices. The output data file will have approx. 1,800,000 obs. Data one; input pid cid; datalines; 1 1 2 1 3 1 4 2 5 2 ; data two; input cid pho; datalines; 1 1 1 2 1 3 1 4 2 1 2 4 2 5 ; /* needed output pid cid pho 1 1 1 1 1 2 1 1 3 1 1 4 2 1 1 2 1 2 2 1 3 2 1 4 3 1 1 3 1 2 3 1 3 3 1 4 4 2 1 4 2 4 4 2 5 5 2 1 5 2 4 5 2 5 */

Online Status	Offline
Date Last Visited	‎03-03-2016 05:16 PM

replacing missing values with random selection from non-missing values...

Merging two datasets without a common by variable

replacing missing values with random selection from non-missing values...

Merging two datasets without a common by variable