首页 > 其他 > 详细

Statistics and Linear Algebra 5

时间:2016-12-03 09:51:32      阅读:258      评论:0      收藏:0      [点我收藏+]

1. The way to get the minimum number in Pandas:

  lowest_income_county = income["county"][income["median_income"].idxmin()] #[income["median_income"].idxmin()] returns the index of minimum number.

  high_pop_county = income[income["pop_over_25"] > 500000]

  lowest_income_high_pop_county = high_pop_county["county"][high_pop_county["median_income"].idxmin()] #find the county that has more than500000 residents with the lowest median income

2. random function , after random seed, only one call of random will follow the seed:

  random.seed(20) #setup the random seed

  new_sequence = [random.randint(0,10) for _ in range(10)]

3. To select certain number of sample form data:

  shopping_sample = random.sample(shopping, 4) #select 4 data from list shopping 

4.  Roll a dice for 10 times in the range 1 to 6, and histogram the result into to a hist with 6 bins.

  def roll():
    return random.randint(1, 6) # create a function to generate a random number from 1 to 6

  random.seed(1)
  small_sample = [roll() for _ in range(10)]

  plt.hist(small_sample, 6)
  plt.show()

5. Roll the dice for 100 times, and repeat this expertment 100 times:

  def probability_of_one(num_trials, num_rolls):
    probabilities = []
    for i in range(num_trials):
      die_rolls = [roll() for _ in range(num_rolls)]
      one_prob = len([d for d in die_rolls if d==1]) / num_rolls
      probabilities.append(one_prob)
    return probabilities

  random.seed(1)
  small_sample = probability_of_one(300, 50)
  plt.hist(small_sample, 20)
  plt.show()

6. Random sampling is more important than picking up samples:  

  mean_median_income = income["median_income"].mean()
  print(mean_median_income)

  def get_sample_mean(start, end):
    return income["median_income"][start:end].mean()

  def find_mean_incomes(row_step):
    mean_median_sample_incomes = []
    for i in range(0, income.shape[0], row_step):
      mean_median_sample_incomes.append(get_sample_mean(i, i+row_step)) # pick up the mean of 1-100, 2-101 ,3 -102
    return mean_median_sample_incomes

  nonrandom_sample = find_mean_incomes(100)
  plt.hist(nonrandom_sample, 20)
  plt.show()

 

  def select_random_sample(count):
    random_indices = random.sample(range(0, income.shape[0]), count)
    return income.iloc[random_indices]

  random.seed(1)

  random_sample = [select_random_sample(100)["median_income"].mean() for _ in range(1000)] # get the mean  of randomly 100 number 
  plt.hist(random_sample, 20)
  plt.show()

7. If we would like to do some calculations between the sample columns, we can do it like this:

  def select_random_sample(count):# This function is to get "count" number of sample from the data set
    random_indices = random.sample(range(0, income.shape[0]), count)
    return income.iloc[random_indices]

  random.seed(1)

  mean_ratios = []
  for i in range(1000): # loop 1000 times
    sample = select_random_sample(100)
    ratio = sample[‘median_income_hs‘]/sample[‘median_income_college‘]
    mean_ratios.append(ratio.mean()) # Get the mean of the ratio between two column and append it into the target list.

  plt.hist(mean_ratios,20)
  plt.show

8. Santistical Signifcance, the way to determine if a result is valid for a population or not:

  significance_value = None

  count = 0
  for i in mean_ratios:
    if i > 0.675: # We get 0.675 from another dataset
      count += 1
  significance_value = count / len(mean_ratios)# The result is 0.14, which means in the result there is only 1.4% percent of country salary is higher than the one we get from salary data from after the program. Which means the program is really successful

Statistics and Linear Algebra 5

原文:http://www.cnblogs.com/kingoscar/p/6127957.html

(0)
(0)
   
举报
评论 一句话评论(0
关于我们 - 联系我们 - 留言反馈 - 联系我们:wmxa8@hotmail.com
© 2014 bubuko.com 版权所有
打开技术之扣,分享程序人生!