Statistics and Linear Algebra 5

时间：2016-12-03 09:51:32 阅读：258 评论：0 收藏：0 [点我收藏+]

1. The way to get the minimum number in Pandas:

　　lowest_income_county = income["county"][income["median_income"].idxmin()] #[income["median_income"].idxmin()] returns the index of minimum number.

　　high_pop_county = income[income["pop_over_25"] > 500000]

　　lowest_income_high_pop_county = high_pop_county["county"][high_pop_county["median_income"].idxmin()] #find the county that has more than500000 residents with the lowest median income

2. random function , after random seed, only one call of random will follow the seed:

　　random.seed(20) #setup the random seed

　　new_sequence = [random.randint(0,10) for _ in range(10)]

3. To select certain number of sample form data:

　　shopping_sample = random.sample(shopping, 4) #select 4 data from list shopping

4. Roll a dice for 10 times in the range 1 to 6, and histogram the result into to a hist with 6 bins.

　　def roll():
　　　　return random.randint(1, 6) # create a function to generate a random number from 1 to 6

　　random.seed(1)
　　small_sample = [roll() for _ in range(10)]

　　plt.hist(small_sample, 6)
　　plt.show()

5.　Roll the dice for 100 times, and repeat this expertment 100 times:

　　def probability_of_one(num_trials, num_rolls):
　　　　probabilities = []
　　　　for i in range(num_trials):
　　　　　　die_rolls = [roll() for _ in range(num_rolls)]
　　　　　　one_prob = len([d for d in die_rolls if d==1]) / num_rolls
　　　　　　probabilities.append(one_prob)
　　　　return probabilities

　　random.seed(1)
　　small_sample = probability_of_one(300, 50)
　　plt.hist(small_sample, 20)
　　plt.show()

6. Random sampling is more important than picking up samples:　　

　　mean_median_income = income["median_income"].mean()
　　print(mean_median_income)

　　def get_sample_mean(start, end):
　　　　return income["median_income"][start:end].mean()

　　def find_mean_incomes(row_step):
　　　　mean_median_sample_incomes = []
　　　　for i in range(0, income.shape[0], row_step):
　　　　　　mean_median_sample_incomes.append(get_sample_mean(i, i+row_step)) # pick up the mean of 1-100, 2-101 ,3 -102
　　　　return mean_median_sample_incomes

　　nonrandom_sample = find_mean_incomes(100)
　　plt.hist(nonrandom_sample, 20)
　　plt.show()

　　def select_random_sample(count):
　　　　random_indices = random.sample(range(0, income.shape[0]), count)
　　　　return income.iloc[random_indices]

　　random.seed(1)

　　random_sample = [select_random_sample(100)["median_income"].mean() for _ in range(1000)] # get the mean of randomly 100 number
　　plt.hist(random_sample, 20)
　　plt.show()

7. If we would like to do some calculations between the sample columns, we can do it like this:

　　def select_random_sample(count):# This function is to get "count" number of sample from the data set
　　　　random_indices = random.sample(range(0, income.shape[0]), count)
　　　　return income.iloc[random_indices]

　　random.seed(1)

　　mean_ratios = []
　　for i in range(1000): # loop 1000 times
　　　　sample = select_random_sample(100)
　　　　ratio = sample[‘median_income_hs‘]/sample[‘median_income_college‘]
　　　　mean_ratios.append(ratio.mean()) # Get the mean of the ratio between two column and append it into the target list.

　　plt.hist(mean_ratios,20)
　　plt.show

8. Santistical Signifcance, the way to determine if a result is valid for a population or not:

　　significance_value = None

　　count = 0
　　for i in mean_ratios:
　　　　if i > 0.675: # We get 0.675 from another dataset
　　　　　　count += 1
　　significance_value = count / len(mean_ratios)# The result is 0.14, which means in the result there is only 1.4% percent of country salary is higher than the one we get from salary data from after the program. Which means the program is really successful

Statistics and Linear Algebra 5

原文：http://www.cnblogs.com/kingoscar/p/6127957.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)