データの可視化#

表の装飾#

pandasでformat#

Table Visualization — pandas 2.2.3 documentation

import pandas as pd
import numpy as np

# サンプルデータ
np.random.seed(0)
df = pd.DataFrame({"A": np.random.poisson(lam=10, size=5), "B": np.random.randn(5)})

小数点の丸め込み#

df.style.format(precision=3, thousands=",", decimal=".")
  A B
0 10 0.144
1 11 1.454
2 9 0.761
3 9 0.122
4 18 0.444
# 列ごとに指定したい場合
df.style.format({"A": "{:.1f}", "B": "{:.1%}"})
  A B
0 10.0 14.4%
1 11.0 145.4%
2 9.0 76.1%
3 9.0 12.2%
4 18.0 44.4%

style.formatは追加のパッケージが必要になる

pandasのみで使えるのは .apply() と 標準の文字整形メソッド str.format を使うもの

df["B"].apply("{:.1%}".format)
0     14.4%
1    145.4%
2     76.1%
3     12.2%
4     44.4%
Name: B, dtype: object
# dictに指定しなかった列は含まれないので注意
df.apply({"A": "{:.1f}".format, "B": "{:.1%}".format})
A B
0 10.0 14.4%
1 11.0 145.4%
2 9.0 76.1%
3 9.0 12.2%
4 18.0 44.4%

値の大きさに応じた色を塗る#

import seaborn as sns
cm = sns.light_palette("green", as_cmap=True)

df.style.background_gradient(cmap=cm)
  A B
0 10 0.144044
1 11 1.454274
2 9 0.761038
3 9 0.121675
4 18 0.443863

棒グラフを作る#

df.style.bar(subset=["A", "B"], color='#d65f5f')
  A B
0 10 0.144044
1 11 1.454274
2 9 0.761038
3 9 0.121675
4 18 0.443863

great_tablesパッケージできれいな表を作る#

Examples – great_tables

論文のような簡潔な表や、アイコンや色を使ったカジュアルな表などいろいろ作れる

Hide code cell source
from great_tables import GT, html
from great_tables.data import airquality

airquality_m = airquality.head(5).assign(Year=1973)

gt_airquality = (
    GT(airquality_m)
    .tab_header(
        title="New York Air Quality Measurements",
        subtitle="Daily measurements in New York City (May 1-10, 1973)",
    )
    .tab_spanner(label="Time", columns=["Year", "Month", "Day"])
    .tab_spanner(label="Measurement", columns=["Ozone", "Solar_R", "Wind", "Temp"])
    .cols_move_to_start(columns=["Year", "Month", "Day"])
    .cols_label(
        Ozone=html("Ozone,<br>ppbV"),
        Solar_R=html("Solar R.,<br>cal/m<sup>2</sup>"),
        Wind=html("Wind,<br>mph"),
        Temp=html("Temp,<br>&deg;F"),
    )
)

gt_airquality
New York Air Quality Measurements
Daily measurements in New York City (May 1-10, 1973)
Time Measurement
Year Month Day Ozone,
ppbV
Solar R.,
cal/m2
Wind,
mph
Temp,
°F
1973 5 1 41.0 190.0 7.4 67
1973 5 2 36.0 118.0 8.0 72
1973 5 3 12.0 149.0 12.6 74
1973 5 4 18.0 313.0 11.5 62
1973 5 5 14.3 56

作図#

autofmt_xdate():日付ラベルを整える#

日付ラベルが重なりそうなら傾きをつけてくれる

matplotlib.figure.Figure.autofmt_xdate — Matplotlib 3.9.2 documentation

import matplotlib.pyplot as plt
import pandas as pd

dates = pd.date_range('2023-10-01', periods=10, freq='D')
values = [5, 3, 4, 6, 7, 2, 8, 5, 6, 7]

fig, ax = plt.subplots()
ax.plot(dates, values)
fig.autofmt_xdate(rotation=45, ha='right')
plt.show()
../_images/dff7c065f124b84c81aab1d3b464c8d8050925538d087a915adf91ff16fd8010.png