Python | MasaTANO HP

2つの変数間での統計量の計算
最小２乗法による重回帰式の算定
空間内挿
Tips

2つの変数間での統計量の計算

相関係数や一次回帰式の求め方。

回帰直線

最後の整数は回帰式の次数に相当。

import numpu as np
a, b = np.polyfit(Dat1, Dat2, 1)
RegLineX = np.array([np.min(Dat1), np.max(Dat2)])
RegLineY = [a*x+b for x in RegLineX]

相関係数、P値

from scipy.stats import pearsonr
r, p = pearsonr(Model, EMDAT)

最小２乗法による重回帰式の算定

Install in conda environment

$ conda install statsmodels

Install modules

import statsmodels.api as sm
from statsmodels.sandbox.regression.predstd import wls_prediction_std

Set regression line

従属変数Yを求める説明変数を定義する。回帰式の形はコメントを参照。切片を求めるためには値1の列を挿入する必要あり(np.repeat(1,nsample))。


nsample = x.size

Set dependent variable
Y = y

# Set independent variable
# in the case of y=a+bx+cx^2
X1 = np.column_stack((np.repeat(1, nsample), x, x**2))

# in the case of y=a+bx
X2 = np.column_stack((np.repeat(1, nsample), x))

# in the case of y=ax
X3 = x

Calculate regression line by OLS

res = sm.OLS(Y, X).fit()

Check results by OLS

print res.summary()

Calculate uncertanity range

prstdの意味がよくわらからなが、iv_lとiv_uが信頼性区間の下限と上限にあたる回帰式。alphaで信頼区間を設定できる。今回は10%。

prstd, iv_l, iv_u = wls_prediction_std(res, alpha=0.09)

Mann-KendallとTeil-sen slope

import scipy.stats as sp
# Mann-Kendall
tau, prob = sp.kendalltau(x, y)

# Teil-sen slope
res = sp.mstats.theilslopes(dat)
slope = res[0]  # slope
inter = res[1]  # intercept

空間内挿

ctlファイルからデータを読み取り、pythonのscipy.interpolate.RegularGridInterpolaterにより、空間内挿する参考プログラム。多次元に拡張可能。

#!/usr/bin/env python
import numpy as np
from scipy.interpolate import RegularGridInterpolator
from grads import GaNum

fDatCtl = 'sst.ctl'
f1dgCtl = 'deg1x1grids.ctl'  # 空間内挿後のグリッドと同じグリッドを持つctlファイル。

ga = GaNum('grads -l')

ga.open(f1dgCtl)
ga('set x 1 360')
lon1deg = ga.exp('lon').reshape(-1)
lat1deg = ga.exp('lat').reshape(-1)
ga('reinit')
grid1deg = np.array([ [y0, x0] for y0, x0 in zip(lat1deg, lon1deg)])

Out = np.zeros([12, len(lon1deg)])   # 出力したいグリッド数。

#  Prepare for original grid information.
ga.open(fDatCtl)
ga('set y 1')
x = ga.exp('lon')
ga('set y 1 120')
ga('set x 1')
y = ga.exp('lat')
y[0] = -90.   # 外挿できないので、端の値は-90度とする。
y[-1] = 90.   # 外挿できないので、端の値は90度とする。
ga('reinit')

#  Main
ga.open(fDatCtl)
for t in range(1, 13):
    #  Prepare interpolate function
    ga('set t %s' % t)
    dat = ga.exp('var')
    func = RegularGridInterpolator((y, x), dat)

    #  Interpolate
    Out[t-1, :] = func(grid1deg)

Tips

グリッドデータのマスク集計

例えば、グリッドごとに算定した被害額を国ごとに集計する場合のわざ。前はFortranを噛ましてたが、numpyとpandasで簡単にできる。

import numpy as np
import pandas as pd

# DamageとMaskは一次元配列。
List = np.array([MskTmp1, DamageTmp1]).T
df = pd.DataFrame(List, columns=('ID', 'Damage'))
dfpv = pd.pivot_table(df, values=['Damage'], index=['ID'], aggfunc=np.sum).reset_index()

Contents