Jupyter notebook使用窍门

时间：2021-01-01 22:49:52 阅读：119 评论：0 收藏：0 [点我收藏+]

Jupyter notebook使用窍门

滚动条滑动窗口

解决jupyter notebook在输出行数太大时出现滚动条滑动窗口而不一次性显示全部输出
暂时发现有两种方式:
1.鼠标点击方式: 先选中代码单元; 然后点击菜单栏的Cell;然后选择下图第二个矩形框中的Currennt Outputs;最后点击toggle即可

新建一个cell, 然后输入如下内容, 即可使得当前cell下面的所有cell可显示全部输出
当前cell上面的cell会不会也显示全部输出不太清楚, 可将这个cell作为第1个cell
下面的两种方式都是可以达到同样效果的, 只是我比较喜欢第2个

%%javascript
//IPython.OutputArea.auto_scroll_threshold = 9999; //设置输出>9999时才出现滑动窗口
IPython.OutputArea.prototype._should_scroll = function(){return false} // 设置不出现滑动窗口 true, auto, false

设置显示最大行和列及浮点数

upyter notebook中设置显示最大行和列及浮点数,在head观察行和列时不会省略
jupyter notebook中df.head(50)经常会因为数据太大，行列自动省略，观察数据时不爽！

pd.set_option(‘max_columns’,1000)
pd.set_option(‘max_row’,300)
pd.set_option(‘display.float_format’, lambda x: ‘%.5f’ % x)

设置显示代码行号

Jupyter Notebook 默认状态下不会在代码左端显示行号，这就导致我们遇到报错时，无法正常调试。
解决方案：
点击View-->点击Toggle Line Numbers
就完成啦。

安装主题

安装主题
pip install jupyterthemes

主题列表
jt -l

使用主题
jt -t oceans16 -f fira -fs 13 -cellw 90% -ofs 11 -dfs 11 -T -N

变量的完美显示

有一点已经众所周知。把变量名称或没有定义输出结果的语句放在cell的最后一行，无需print语句，Jupyter也会显示变量值。当使用Pandas DataFrames时这一点尤其有用，因为输出结果为整齐的表格。
鲜为人知的是，你可以通过修改内核选项ast_note_interactivity，使得Jupyter对独占一行的所有变量或者语句都自动显示，这样你就可以马上看到多个语句的运行结果了。
In [1]: from IPython.core.interactiveshell import InteractiveShell

    InteractiveShell.ast_node_interactivity = "all"

In [2]: from pydataset import data

    quakes = data(‘quakes‘)

    quakes.head()

    quakes.tail()

Out[2]:

    lat long    depth   mag stations

    1   -20.42  181.62  562 4.8 41

    2   -20.62  181.03  650 4.2 15

    3   -26.00  184.10  42  5.4 43

    4   -17.97  181.66  626 4.1 19

    5   -20.42  181.96  649 4.0 11

Out[2]:

    lat long    depth   mag stations

    996 -25.93  179.54  470 4.4 22

    997 -12.28  167.06  248 4.7 35

    998 -20.13  184.20  244 4.5 34

    999 -17.40  187.80  40  4.5 14

    1000    -21.59  170.56  165 6.0 119

如果你想在各种情形下（Notebook和Console）Jupyter都同样处理，用下面的几行简单的命令创建文件~/.ipython/profile_default/ipython_config.py即可实现：

c = get_config()

Run all nodes interactively

c.InteractiveShell.ast_node_interactivity = "all"

轻松链接到文档

在Help 菜单下，你可以找到常见库的在线文档链接，包括Numpy，Pandas，Scipy和Matplotlib等。
另外，在库、方法或变量的前面打上?，即可打开相关语法的帮助文档。
In [3]: ?str.replace()
Docstring:
S.replace(old, new[, count]) -> str

Return a copy of S with all occurrences of substring
old replaced by new. If the optional argument count is
given, only the first count occurrences are replaced.
Type: method_descriptor

在notebok里作图

matplotlib （事实标准）（http://matplotlib.org/），可通过%matplotlib inline 激活。
（https://www.dataquest.io/blog/matplotlib-tutorial/）
%matplotlib notebook 提供交互性操作，但可能会有点慢，因为响应是在服务器端完成的。
mpld3（https://github.com/mpld3/mpld3）提供matplotlib代码的替代性呈现（通过d3），虽然不完整，但很好。
bokeh（http://bokeh.pydata.org/en/latest/）生成可交互图像的更好选择。
plot.ly（https://plot.ly/）可以生成非常好的图，可惜是付费服务。

技术分享图片

末句函数不输出

有时候不让末句的函数输出结果比较方便，比如在作图的时候，此时，只需在该函数末尾加上一个分号即可。

运行Shell命令

在notebook内部运行shell命令很简单，这样你就可以看到你的工作文件夹里有哪些数据集。
In [7]: !ls *.csv

用LaTex写公式

当你在一个Markdown单元格里写LaTex时，它将用MathJax呈现公式：如
P(A∣B)=（P(B∣A),P(A)）/P(B)
会变成
技术分享图片

在notebook内用不同的内核运行代码

如果你想要，其实可以把不同内核的代码结合到一个notebook里运行。
只需在每个单元格的起始，用Jupyter magics调用kernal的名称：
%%bash

%%HTML

%%python2

%%python3

%%ruby

%%perl

In [6]: %%bash
for i in {1..5}
do
echo "i is $i"
done

        i is 1
        i is 2
        i is 3
        i is 4
        i is 5

给Jupyter安装其他的内核

Jupyter的优良性能之一是可以运行不同语言的内核。下面以运行R内核为例说明：
简单的方法：通过Anaconda安装R内核
conda install -c r r-essentials
稍微麻烦的方法：手动安装R内核
如果你不是用Anaconda，过程会有点复杂，首先，你需要从CRAN安装R。
之后，启动R控制台，运行下面的语句：
install.packages(c(‘repr‘, ‘IRdisplay‘, ‘crayon‘, ‘pbdZMQ‘, ‘devtools‘))
devtools::install_github(‘IRkernel/IRkernel‘)
IRkernel::installspec() # to register the kernel in the current R installation

在同一个notebook里运行R和Python

要这么做，最好的方法事安装rpy2（需要一个可以工作的R），用pip操作很简单：
pip install rpy2
然后，就可以同时使用两种语言了，甚至变量也可以在二者之间公用：
In [1]: %load_ext rpy2.ipython
In [2]: %R require(ggplot2)
Out[2]: array([1], dtype=int32)
In [3]: import pandas as pd
df = pd.DataFrame({
‘Letter‘: [‘a‘, ‘a‘, ‘a‘, ‘b‘, ‘b‘, ‘b‘, ‘c‘, ‘c‘, ‘c‘],
‘X‘: [4, 3, 5, 2, 1, 7, 7, 5, 9],
‘Y‘: [0, 4, 3, 6, 7, 10, 11, 9, 13],
‘Z‘: [1, 2, 3, 1, 2, 3, 1, 2, 3]
})
In [4]: %%R -i df
ggplot(data = df) + geom_point(aes(x = X, y= Y, color = Letter, size = Z))

技术分享图片

用其他语言写函数

有时候numpy的速度有点慢，我想写一些更快的代码。
原则上，你可以在动态库里编译函数，用python来封装…
但是如果这个无聊的过程不用自己干，岂不更好？
你可以在cython或fortran里写函数，然后在python代码里直接调用。
首先，你要先安装：
!pip install cython fortran-magic

In [ ]: %load_ext Cython
In [ ]: %%cython
def myltiply_by_2(float x):
return 2.0 * x
In [ ]: myltiply_by_2(23.)

我个人比较喜欢用Fortran，它在写数值计算函数时十分方便。更多的细节在（ http://arogozhnikov.github.io/2015/09/08/SpeedBenchmarks.html ）。

n [ ]: %load_ext fortranmagic
In [ ]: %%fortran
subroutine compute_fortran(x, y, z)
real, intent(in) :: x(??, y(??
real, intent(out) :: z(size(x, 1))

        z = sin(x + y)

    end subroutine compute_fortran

In [ ]: compute_fortran([1, 2, 3], [4, 5, 6])

还有一些别的跳转系统可以加速python 代码。更多的例子见（ http://arogozhnikov.github.io/2015/09/08/SpeedBenchmarks.html ）
你可以在cython或fortran里写函数，然后在python代码里使用。

支持多指针

Jupyter支持多个指针同步编辑，类似Sublime Text编辑器。按下Alt键并拖拽鼠标即可实现。

技术分享图片

Jupyter外界拓展

Jupyter-contrib extensions（ https://github.com/ipython-contrib/jupyter_contrib_nbextensions ）是一些给予Jupyter更多更能的延伸程序，包括jupyter spell-checker和code-formatter之类.
下面的命令安装这些延伸程序，同时也安装一个菜单形式的配置器，可以从Jupyter的主屏幕浏览和激活延伸程序。
!pip install https://github.com/ipython-contrib/jupyter_contrib_nbextensions/tarball/master
!pip install jupyter_nbextensions_configurator
!jupyter contrib nbextension install --user
!jupyter nbextensions_configurator enable --user

从Jupyter notebook创建演示稿

Damian Avila的RISE（https://github.com/damianavila/RISE ）允许你从已有的notebook创建一个powerpoint形式的演示稿。
你可以用conda来安装RISE：
conda install -c damianavila82 rise
或者用pip安装：
pip install RISE
然后运行下面的代码来安装和激活延伸程序：
jupyter-nbextension install rise --py --sys-prefix
jupyter-nbextension enable rise --py --sys-prefix

Jupyter输出系统

Notebook本身以HTML的形式显示，单元格输出也可以是HTML形式的，所以你可以输出任何东西：视频/音频/图像。
这个例子是浏览我所有的图片，并显示前五张图的缩略图。
In [12]: import os
from IPython.display import display, Image
names = [f for f in os.listdir(‘../images/ml_demonstrations/‘) if f.endswith(‘.png‘)]
for name in names[:5]:
display(Image(‘../images/ml_demonstrations/‘ + name, width=100))

技术分享图片

我们也可以用bash命令创建一个相同的列表，因为magics和bash运行函数后返回的是python 变量：
In [10]: names = !ls ../images/ml_demonstrations/*.png
names[:5]
Out[10]: [‘../images/ml_demonstrations/colah_embeddings.png‘,
‘../images/ml_demonstrations/convnetjs.png‘,
‘../images/ml_demonstrations/decision_tree.png‘,
‘../images/ml_demonstrations/decision_tree_in_course.png‘,
‘../images/ml_demonstrations/dream_mnist.png‘]

大数据分析

很多方案可以解决查询/处理大数据的问题：
ipyparallel（https://github.com/ipython/ipyparallel ）（之前叫 ipython cluster）是一个在python中进行简单的map-reduce运算的良好选择。我们在rep中使用它来并行训练很多机器学**模型。
pyspark（http://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_ipython.html ）
spark-sql magic %%sql（https://github.com/jupyter-incubator/sparkmagic ）

分享notebook

分享notebook最方便的方法是使用notebook文件（.ipynb），但是对那些不使用notebook的人，你还有这些选择：
通过File > Download as > HTML 菜单转换到html文件。

用gists（https://www.dataquest.io/blog/jupyter-notebook-tips-tricks-shortcuts/gist.github.com ）或者github分享你的notebook文件。这两个都可以呈现notebook，示例见链接（https://github.com/dataquestio/solutions/blob/master/Mission202Solution.ipynb ）

如果你把自己的notebook文件上传到github的仓库，可以使用很便利的Mybinder（http://mybinder.org/ ）服务，允许另一个人进行半个小时的Jupyter交互连接到你的仓库。

用jupyterhub（https://github.com/jupyterhub/jupyterhub ）建立你自己的系统，这样你在组织微型课堂或者工作坊，无暇顾及学生们的机器时就非常便捷了。

将你的notebook存储在像dropbox这样的网站上，然后把链接放在nbviewer（http://nbviewer.jupyter.org/ ），nbviewer可以呈现任意来源的notebook。

用菜单File > Download as > PDF 保存notebook为PDF文件。如果你选择本方法，我强烈建议你读一读Julius Schulz的文章（http://blog.juliusschulz.de/blog/ultimate-ipython-notebook ）

用Pelican从你的notebook创建一篇博客（https://www.dataquest.io/blog/how-to-setup-a-data-science-blog/ ）。

Jupyter Notebook 的快捷键

高手们都知道，快捷键可以节省很多时间。Jupyter在顶部菜单提供了一个快捷键列表：Help > Keyboard Shortcuts 。每次更新Jupyter的时候，一定要看看这个列表，因为不断地有新的快捷键加进来。另外一个方法是使用Cmd + Shift + P (  Linux 和 Windows下 Ctrl + Shift + P亦可)调出命令面板。这个对话框可以让你通过名称来运行任何命令——当你不知道某个操作的快捷键，或者那个操作没有快捷键的时候尤其有用。这个功能与苹果电脑上的Spotlight搜索很像，一旦开始使用，你会欲罢不能。

命令模式下的快键键

Enter : 转入编辑模式 Shift-Enter : 运行本单元，选中下个单元
Ctrl-Enter : 运行本单元
Alt-Enter : 运行本单元，在其下插入新单元
Y : 单元转入代码状态
M :单元转入markdown状态
R : 单元转入raw状态
1 : 设定 1 级标题
2 : 设定 2 级标题
3 : 设定 3 级标题
4 : 设定 4 级标题
5 : 设定 5 级标题
6 : 设定 6 级标题
Up : 选中上方单元
K : 选中上方单元
Down : 选中下方单元
J : 选中下方单元
Shift-K : 扩大选中上方单元
Shift-J : 扩大选中下方单元
A : 在上方插入新单元
B : 在下方插入新单元
X : 剪切选中的单元
C : 复制选中的单元
Shift-V : 粘贴到上方单元
V : 粘贴到下方单元
Z : 恢复删除的最后一个单元
D,D : 删除选中的单元
Shift-M : 合并选中的单元
Ctrl-S : 文件存盘
S : 文件存盘
L : 转换行号
O : 转换输出
Shift-O : 转换输出滚动
Esc : 关闭页面
Q : 关闭页面
H : 显示快捷键帮助
I,I : 中断Notebook内核
0,0 : 重启Notebook内核
Shift : 忽略
Shift-Space : 向上滚动
Space : 向下滚动

编辑模式 ( Enter 键启动)

编辑模式时边框为绿色，此时允许用户编辑文档，此模式下的快捷键有（常用快捷键标红强调）：
Esc : 进入命令模式
Shift-Enter : 运行本单元，选中下一单元
Ctrl-Enter : 运行本单元
Alt-Enter : 运行本单元，在下面插入一单元
Tab : 代码补全或缩进
Shift-Tab : 提示
Ctrl-] : 缩进
Ctrl-[ : 解除缩进
Ctrl-A : 全选
Ctrl-Z : 复原
Ctrl-Shift-Z : 再做
Ctrl-Y : 再做
Ctrl-Home : 跳到单元开头
Ctrl-Up : 跳到单元开头
Ctrl-End : 跳到单元末尾
Ctrl-Down : 跳到单元末尾
Ctrl-Left : 跳到左边一个字首
Ctrl-Right : 跳到右边一个字首
Ctrl-Backspace : 删除前面一个
Ctrl-Delete : 删除后面一个字
Ctrl-M : 进入命令模式
Ctrl-Shift-- : 分割单元
Ctrl-Shift-Subtract : 分割单元
Ctrl-S : 文件存盘
Shift : 忽略
Up : 光标上移或转入上一单元
Down :光标下移或转入下一单元

jupyter notebook最最常用的快捷键

shift+Tab 工具提示

推荐指数：☆☆☆☆☆☆（6颗星）
类似在vscode中使用ctrl+鼠标左键，可以查看进入函数内部，查看参数列表、示例等等。
与此类似，在notebook中使用shift+Tab可以显示工具提示，简直就是福音。
用法：鼠标点击到要看的函数上，shift+tab
举个栗子：
matplotlib.pyplot中绘制直方图，函数hist（）参数很多，功能很多。
用到的时候，不免忘记了有哪些功能、各个参数代表什么意思.

技术分享图片

density不是密度吗，密度和直方图有什么关系？
参数解释：
density : bool, optional. If True, the first element of the return tuple will be the counts normalized to form a probability density, i.e.,the area (or integral) under the histogram will sum to 1.This is achieved by dividing the count by the number of observations times the bin width and not dividing by the total number of observations. If stacked is also True, the sum of the histograms is normalized to 1.
Default is None for both normed and density. If either is set, then that value will be used. If neither are set, then the args will be treated as False.If both density and normed are set an error is raised.
这解释我在vscode并没有看到（估计还是没用到应该的快捷键），总之，这里的解释很到位，超级赞！

按esc进入命令模式，再按m（简写esc，c）

推荐指数：☆☆☆☆☆（5颗星）
再按m，当前单元格（官方翻译叫做代码块）转换为markdown格式。
按esc进入命令模式，按y，当前单元格转换为代码。

m和markdown是首字母关系，y和code有什么关系呢，为什么用y，不用c，实际上c已经名花有主了
官方提示，以及众多博客，都是给的大写M，这里不区分大小写…

通用规则：
esc，c ：先按esc，再按c
esc+c ：一起按

ctrl+z 后退一步

推荐指数：☆☆☆☆☆（5颗星）
后退一步，和vscode一样。
不管在什么编辑模式下，都使用

alt+enter 运行单元格

Esc + F 在代码中查找、替换，忽略输出。

Esc + O 在cell和输出结果间切换。

选择多个cell:

Shift + J 或 Shift + Down 选择下一个cell。
Shift + K 或 Shift + Up 选择上一个cell。
一旦选定cell，可以批量删除/拷贝/剪切/粘贴/运行。当你需要移动notebook的一部分时这个很有用。

Shift + M 合并cell.

推荐指数：☆☆☆☆☆☆（6颗星）
不用说了，你想得到的功能都在这里
当重复用了几次相同的操作，觉得麻烦的时候，找一找看看吧

Jupyter Magic命令

%matplotlib inline 是Jupyter Magic命令之一
推荐阅读Jupyter magic命令的相关文档
（ http://ipython.readthedocs.io/en/stable/interactive/magics.html ），它一定会对你很有帮助。下面是我最爱的几个：

Magic-%env:设置环境变量

不必重启jupyter服务器进程，也可以管理notebook的环境变量。有的库（比如theano）使用环境变量来控制其行为，%env是最方便的途径。
In [55]: # Running %env without any arguments
# lists all environment variables

        # The line below sets the environment
        # variable OMP_NUM_THREADS
        %env OMP_NUM_THREADS=4

        env: OMP_NUM_THREADS=4

Magic-%run:运行python代码

%run 可以运行.py格式的python代码——这是众所周知的。不那么为人知晓的事实是它也可以运行其它的jupyter notebook文件，这一点很有用。
注意：使用%run 与导入一个python模块是不同的。
In [56]: # this will execute and show the output from
# all code cells of the specified notebook
%run ./two-histograms.ipynb

Magic-%load：从外部脚本中.py代码

该操作用外部脚本替换当前cell。可以使用你的电脑中的一个文件作为来源，也可以使用URL。
In [ ]: # Before Running
%load ./hello_world.py
In [61]: # After Running
# %load ./hello_world.py
if name == "main":
print("Hello World!")

Magic-%store：在notebook文件之间传递变量

%store 命令可以在两个notebook文件之间传递变量。

In [62]: data = ‘this is the string I want to pass to different notebook‘

        %store data
        del data # This has deleted the variable
        ---------------------------------------------------

Stored ‘data‘ (str)
现在，在一个新的notebook文档里……

In [1]: %store -r data
print(data)

this is the string I want to pass to different notebook

Magic-%who：列出所有的全局变量

不加任何参数， %who 命令可以列出所有的全局变量。加上参数 str 将只列出字符串型的全局变量。
In [1]: one = "for the money"
two = "for the show"
three = "to get ready now go cat go"
%who str

one three two

Magic-计时

有两种用于计时的jupyter magic命令： %%time 和 %timeit.当你有一些很耗时的代码，想要查清楚问题出在哪时，这两个命令非常给力。
仔细体会下我的描述哦。
%%time 会告诉你cell内代码的单次运行时间信息。
In [4]: %%time
import time
for _ in range(1000):
time.sleep(0.01)# sleep for 0.01 seconds

CPU times: user 21.5 ms, sys: 14.8 ms, total: 36.3 ms
Wall time: 11.6 s
%%timeit 使用了Python的 timeit 模块，该模块运行某语句100，000次（默认值），然后提供最快的3次的平均值作为结果。
In [3]: import numpy
%timeit numpy.random.normal(size=100)

The slowest run took 7.29 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 5.5 μs per loop

Magic-writefile and %pycat:导出cell内容/显示外部脚本的内容

使用%%writefile magic可以保存cell的内容到外部文件。而%pycat功能相反，把外部文件语法高亮显示（以弹出窗方式）。
In [7]: %%writefile pythoncode.py

    import numpy
    def append_if_not_exists(arr, x):
        if x not in arr:
            arr.append(x)

    def some_useless_slow_function():
        arr = list()
        for i in range(10000):
            x = numpy.random.randint(0, 10000)
            append_if_not_exists(arr, x)

Writing pythoncode.py

In [8]: %pycat pythoncode.py

    import numpy
    def append_if_not_exists(arr, x):
        if x not in arr:
            arr.append(x)

    def some_useless_slow_function():
        arr = list()
        for i in range(10000):
            x = numpy.random.randint(0, 10000)
            append_if_not_exists(arr, x)

Magic-%prun：告诉你程序中每个函数消耗的时间

使用%prun statementname将给您一个有序表，它显示了每个内部函数在语句中被调用的次数，每次调用的时间以及函数的所有运行时间的累积时间
In [47]: %prun some_useless_slow_function()

Magic-用%pdb调试程序

Jupyter 有自己的调试界面The Python Debugger (pdb)（ https://docs.python.org/3.5/library/pdb.html ），使得进入函数内部检查错误成为可能。
Pdb中可使用的命令见链接（ https://docs.python.org/3.5/library/pdb.html#debugger-commands ）
In [ ]: %pdb

    def pick_and_take():
        picked = numpy.random.randint(0, 1000)
        raise NotImplementedError()

    pick_and_take()
    Automatic pdb calling has been turned ON
    ---------------------------------------------------------------------------
    NotImplementedError                       Traceback (most recent call last)
    <ipython-input-24-0f6b26649b2e> in <module>()
          5     raise NotImplementedError()
          6 
    ----> 7 pick_and_take()

    <ipython-input-24-0f6b26649b2e> in pick_and_take()
          3 def pick_and_take():
          4     picked = numpy.random.randint(0, 1000)
    ----> 5     raise NotImplementedError()
          6 
          7 pick_and_take()

    NotImplementedError: 
    > <ipython-input-24-0f6b26649b2e>(5)pick_and_take()
          3 def pick_and_take():
          4     picked = numpy.random.randint(0, 1000)
    ----> 5     raise NotImplementedError()
          6 
          7 pick_and_take()

    ipdb>

Jupyter Notebook帮助文档学习

help选项下有常用包的官方文档：
碰到相关问题，先查看官方文档，特别是查看示例（examples），帮助很大。
例如matplotlib的官方文档，各个函数有详细的解释和示例，非常棒！

Jupyter notebook使用窍门

原文：https://www.cnblogs.com/zflearning/p/14220901.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)