[Numpy Exercise 100] 91번 ~ 100번 문제풀이

728x90

[Numpy exercise 100] 1번 ~ 30번 문제풀이

Data Scientist로의 역량 강화와 데이터 분석 분야 면접 준비를 위해 numpy를 다시 한번 복습하고자 Numpy excersise 100문제를 풀기로 마음을 먹었다. https://github.com/rougier/numpy-100 GitHub - rougier/nu..

minding-deep-learning.tistory.com

31번 ~ 60번 문제풀이

[Numpy exercise 100] 31번 ~ 60번 문제풀이

[Numpy exercise-100] 1번 ~ 30번 문제풀이 [Numpy exercise-100] 1번 ~ 30번 문제풀이 Data Scientist로의 역량 강화와 데이터 분석 분야 면접 준비를 위해 numpy를 다시 한번 복습하고자 Numpy excersise 100문..

minding-deep-learning.tistory.com

61번 ~ 70번 문제풀이

[Numpy exercise 100] 61번 ~ 70번 문제풀이

[Numpy exercise 100] 1번 ~ 30번 문제풀이 [Numpy exercise 100] 1번 ~ 30번 문제풀이 Data Scientist로의 역량 강화와 데이터 분석 분야 면접 준비를 위해 numpy를 다시 한번 복습하고자 Numpy excersise 100문..

minding-deep-learning.tistory.com

71번 ~ 80번 문제풀이

[Numpy exercise 100] 71번 ~ 80번 문제풀이

1번 ~ 30번 문제풀이 [Numpy exercise 100] 1번 ~ 30번 문제풀이 Data Scientist로의 역량 강화와 데이터 분석 분야 면접 준비를 위해 numpy를 다시 한번 복습하고자 Numpy excersise 100문제를 풀기로 마음을 먹..

minding-deep-learning.tistory.com

81번 ~ 90번 문제풀이

[Numpy exercise 100] 81번 ~ 90번 문제풀이

[Numpy exercise 100] 1번 ~ 30번 문제풀이 [Numpy exercise 100] 31번 ~ 60번 문제풀이 [Numpy exercise 100] 61번 ~ 70번 문제풀이 [Numpy exercise 100] 71번 ~ 80번 문제풀이 https://github.com/rougier/nump..

minding-deep-learning.tistory.com

문제 원본

https://github.com/rougier/numpy-100

GitHub - rougier/numpy-100: 100 numpy exercises (with solutions)

100 numpy exercises (with solutions). Contribute to rougier/numpy-100 development by creating an account on GitHub.

github.com

91. How to create a record array from a regular array? (★★★)

정규 배열에서 레코드 배열을 만드는 방법
레코드 배열(record array)은 인덱스 대신 속성별로 구조화된 배열필드에 엑세스할 수 있음
관련 문서 : https://numpy.org/doc/stable/user/basics.rec.html

Structured arrays — NumPy v1.22 Manual

Structure Comparison If the dtypes of two void structured arrays are equal, testing the equality of the arrays will result in a boolean array with the dimensions of the original arrays, with elements set to True where all fields of the corresponding struct

numpy.org

Z = np.array([("Hello", 2.5, 3),
              ("World", 3.6, 2)])
R = np.core.records.fromarrays(Z.T,
                               names='col1, col2, col3',
                               formats = 'S8, f8, i8')
print(R)

>>>
[(b'Hello', 2.5, 3) (b'World', 3.6, 2)]

92. Consider a large vector Z, compute Z to the power of 3 using 3 different methods (★★★)

큰 벡터 Z를 생각해 보자, 3개의 다른 방법을 사용하여 Z를 3의 거듭제곱으로 계산한다.
벡터의 3 거듭제곱으로 계산하는 방법 3가지
- np.power() 메소드
- 단순히 벡터를 3번 곱함
- np.einsum() 메소드 : https://ita9naiwa.github.io/numeric%20calculation/2018/11/10/Einsum.html (ita9naiwa 님)

x = np.random.rand(int(5e7))

%timeit np.power(x,3)
%timeit x*x*x
%timeit np.einsum('i,i,i->i',x,x,x) # x를 3번 곱한 것을 원소로 지정

>>>
1.44 s ± 6.19 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
215 ms ± 2.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
130 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

93. Consider two arrays A and B of shape (8,3) and (2,2). How to find rows of A that contain elements of each row of B regardless of the order of the elements in B? (★★★)

모양 (8,3)과 (2,2)의 두 배열 A와 B를 고려하십시오. B의 원소 순서와 상관없이 B의 각 행이 포함된 A의 행을 찾는 방법은 무엇입니까?
A에 2차원을 추가한 배열(한 행에 스칼라값만 남게됨)과 B를 비교하여 원소 4개 중 하나라도 같은 것이 있는지 찾음
any(3,1) 메서드를 통해 3차원그룹의 각 행들을 비교해 True값이 하나라도 있으면 True, 없으면 False를 반환
all() 메서드를 통해 두 값이 모두 True인 행들만 남김
np.where() 메서드를 통해 위에서 구한 값의 행 인덱스를 추출

# 최종 정답코드

A = np.random.randint(0,5,(8,3))
B = np.random.randint(0,5,(2,2))
C = (A[..., np.newaxis, np.newaxis] == B)
rows = np.where(C.any((3,1)).all(1))[0]
print(rows)

>>>
[2 6 7]

코드 설명

# A에 2차원 추가 ( (8,3) --> (8,3,1,1) ) 한 배열 (스칼라값만 남게됨)
A[..., np.newaxis, np.newaxis]

>>>
[[[[4]]

  [[4]]

  [[3]]]


 [[[0]]

  [[0]]

  [[1]]]


 ...


 [[[3]]

  [[0]]

  [[0]]]


 [[[3]]

  [[1]]

  [[1]]]]

# 위의 A를 B와 비교 (bool형으로 값이 도출)
C = (A[..., np.newaxis, np.newaxis] == B)
print(C)

>>>
[[[[False  True]
   [False  True]]

  [[False  True]
   [False  True]]

  [[False False]
   [False False]]]


 [[[ True False]
   [False False]]

  [[ True False]
   [False False]]

  [[False False]
   [ True False]]]


...


 [[[False False]
   [False False]]

  [[False False]
   [ True False]]

  [[False False]
   [ True False]]]]

# any()를 통해 각 그룹 행별로 갑 비교하여 True값이 하나라도 있으면 True 없으면 False
# all()을 통해 각 행이 모두 True인 것만 반환
# np.where()을 통해 all()에 해당하는 행인덱스 반환
rows = np.where(C.any((3,1)).all(1))[0]
print(rows)

>>>
[0 1 2 4]

같은 색깔끼리 비교하여 True가 하나라도 있으면 True 반환, 없으면 False 반환
이미지와 같은 경우는 노란색(0행)은 True, 빨간색(1행)은 False가 반환된다.
all()에 의해 결국 이 그룹은 False가 되어 최종 결과값에 출력되지 않는다.

94. Considering a 10x3 matrix, extract rows with unequal values (e.g. [2,2,3]) (★★★)

Z = np.random.randint(0,5,(10,3))
# solution for arrays of all dtypes (including string arrays and record arrays)
E = np.all(Z[:,1:] == Z[:,:-1], axis=1)
U = Z[~E]
print(U)
# soluiton for numerical arrays only, will work for any number of columns in Z
U = Z[Z.max(axis=1) != Z.min(axis=1),:]
print(U)

>>>
[[0 3 3]
 [0 3 1]
 [3 2 0]
 [4 0 4]
 [4 1 3]
 [1 3 3]
 [0 0 4]
 [0 2 4]
 [2 2 0]
 [2 2 1]]
 
[[0 3 3]
 [0 3 1]
 [3 2 0]
 [4 0 4]
 [4 1 3]
 [1 3 3]
 [0 0 4]
 [0 2 4]
 [2 2 0]
 [2 2 1]]

95. Convert a vector of ints into a matrix binary representation (★★★)

정수를 원소로 가진 벡터를 행렬 이진수 표현으로 변환하기
- reshape과 and조건 등을 활용해 이진수 표현
- np.unpackbits() 메서드 활용

# 최종 정답코드

I = np.array([0, 1, 2, 3, 15, 16, 32, 64, 128])
B = ((I.reshape(-1,1) & (2**np.arange(8))) != 0).astype(int)
# np.arange()를 사용했기 때문에 기존 이진수 표현법과 반대로 표기, [:,::-1]를 통해 원소를 반대로 배치해줌
print(B[:,::-1])

# np.unpackbits()를 통한 방법
I = np.array([0, 1, 2, 3, 15, 16, 32, 64, 128], dtype=np.uint8)
print(np.unpackbits(I[:, np.newaxis], axis=1))

코드 설명 (첫번째 코드)

# and 조건을 활용해 각 원소값과 같은 값이 되도록 배치 (이진수 표현)
(I.reshape(-1,1) & (2**np.arange(8)))

>>>
array([[  0,   0,   0,   0,   0,   0,   0,   0],
       [  1,   0,   0,   0,   0,   0,   0,   0],
       [  0,   2,   0,   0,   0,   0,   0,   0],
       [  1,   2,   0,   0,   0,   0,   0,   0],
       [  1,   2,   4,   8,   0,   0,   0,   0],
       [  0,   0,   0,   0,  16,   0,   0,   0],
       [  0,   0,   0,   0,   0,  32,   0,   0],
       [  0,   0,   0,   0,   0,   0,  64,   0],
       [  0,   0,   0,   0,   0,   0,   0, 128]], dtype=int32)

# 비교문을 통해 bool로 바꿔 0이 아닌 값 찾아내고 .astype(int)를 통해 숫자 1과 0으로 바꿔주기
((I.reshape(-1,1) & (2**np.arange(8))) != 0)

>>>
array([[False, False, False, False, False, False, False, False],
       [ True, False, False, False, False, False, False, False],
       [False,  True, False, False, False, False, False, False],
       [ True,  True, False, False, False, False, False, False],
       [ True,  True,  True,  True, False, False, False, False],
       [False, False, False, False,  True, False, False, False],
       [False, False, False, False, False,  True, False, False],
       [False, False, False, False, False, False,  True, False],
       [False, False, False, False, False, False, False,  True]])
       
((I.reshape(-1,1) & (2**np.arange(8))) != 0).astype(int)

>>>
array([[0, 0, 0, 0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 0, 0, 0, 0, 0],
       [1, 1, 0, 0, 0, 0, 0, 0],
       [1, 1, 1, 1, 0, 0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 0],
       [0, 0, 0, 0, 0, 1, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 1]])

96. Given a two dimensional array, how to extract unique rows? (★★★)

2차원 배열이 주어졌을 때 고유 행을 추출하는 방법은 무엇입니까?
np.unique()를 통해 고유(중복되지 않는)행 추출
numpy 1.13 버전 이전에는 다른 방법 사용

Z = np.random.randint(0,2,(6,3))
uZ = np.unique(Z, axis=0)
print(uZ)

# numpy 1.13 ver 이전

Z = np.random.randint(0,2,(6,3))
T = np.ascontiguousarray(Z).view(np.dtype((np.void, Z.dtype.itemsize * Z.shape[1])))
_, idx = np.unique(T, return_index=True)
uZ = Z[idx]
print(uZ)

97. Considering 2 vectors A & B, write the einsum equivalent of inner, outer, sum, and mul function (★★★)

2개의 벡터 A와 B를 고려하여 내부곱(내적), 외부곱(외적), 더하기, 곱하기 함수에 해당하는 einsum을 작성한다.
einsum 관련문서(문제 저자 추천) : https://ajcr.net/Basic-guide-to-einsum/

A basic introduction to NumPy's einsum – ajcr – Haphazard investigations

The einsum function is one of NumPy’s jewels. It can often outperform familiar array functions in terms of speed and memory efficiency, thanks to its expressive power and smart loops. On the downside, it can take a little while understand the notation an

ajcr.net

A = np.random.uniform(0,1,10)
B = np.random.uniform(0,1,10)

np.einsum('i->', A)       # np.sum(A)
np.einsum('i,i->i', A, B) # A * B
np.einsum('i,i', A, B)    # np.inner(A, B)
np.einsum('i,j->ij', A, B)    # np.outer(A, B)

98. Considering a path described by two vectors (X,Y), how to sample it using equidistant samples (★★★)?

두 벡터(X,Y)가 설명하는 경로를 고려하여 등거리 표본을 사용하여 샘플링하는 방법

phi = np.arange(0, 10*np.pi, 0.1)
a = 1
x = a*phi*np.cos(phi)
y = a*phi*np.sin(phi)

dr = (np.diff(x)**2 + np.diff(y)**2)**.5 # segment lengths
r = np.zeros_like(x)
r[1:] = np.cumsum(dr)                # integrate path
r_int = np.linspace(0, r.max(), 200) # regular spaced path
x_int = np.interp(r_int, r, x)       # integrate path
y_int = np.interp(r_int, r, y)

99. Given an integer n and a 2D array X, select from X the rows which can be interpreted as draws from a multinomial distribution with n degrees, i.e., the rows which only contain integers and which sum to n. (★★★)

정수 n과 2D 배열 X가 주어지면, X에서 n도를 갖는 다항 분포에서 끌어온 것으로 해석될 수 있는 행을 선택합니다.

X = np.asarray([[1.0, 0.0, 3.0, 8.0],
                [2.0, 0.0, 1.0, 1.0],
                [1.5, 2.5, 1.0, 0.0]])
n = 4
M = np.logical_and.reduce(np.mod(X, 1) == 0, axis=-1)
M &= (X.sum(axis=-1) == n)
print(X[M])

100. Compute bootstrapped 95% confidence intervals for the mean of a 1D array X (i.e., resample the elements of an array with replacement N times, compute the mean of each sample, and then compute percentiles over the means). (★★★)

1D 어레이 X의 평균에 대한 부트스트랩 95% 신뢰 구간 계산(즉, 어레이의 요소를 교체 N회로 다시 샘플링하고 각 표본의 평균을 계산한 다음 평균에 대한 백분위수를 계산)
부트스트랩 계산 : 표본으로 주어진 데이터를 기준으로 복원 표집(이미 표집된 표본까지 포함)하여 통계량을 구하는 것
- 관련 문서(블로그 learning carrot님) : https://learningcarrot.wordpress.com/2015/11/12/%EB%B6%80%ED%8A%B8%EC%8A%A4%ED%8A%B8%EB%9E%A9%EC%97%90-%EB%8C%80%ED%95%98%EC%97%AC-bootstrapping/
np.percentile()로 백분위 %단위의 중분위수를 구한다.
- np.percentile 공식 문서 : https://numpy.org/doc/stable/reference/generated/numpy.percentile.html

numpy.percentile — NumPy v1.22 Manual

If q is a single percentile and axis=None, then the result is a scalar. If multiple percentiles are given, first axis of the result corresponds to the percentiles. The other axes are the axes that remain after the reduction of a. If the input contains inte

numpy.org

X = np.random.randn(100) # random 1D array
N = 1000 # number of bootstrap samples
idx = np.random.randint(0, X.size, (N, X.size))
means = X[idx].mean(axis=1)
confint = np.percentile(means, [2.5, 97.5]) # 각각 백분율 2.5%와 97.5%의 중분위수
print(confint)

>>>
[-0.06785416  0.3349777 ]

728x90

'Minding's Programming > Numpy & Pandas' 카테고리의 다른 글

[Numpy exercise 100] 81번 ~ 90번 문제풀이 (0)	2021.12.30
[Numpy exercise 100] 71번 ~ 80번 문제풀이 (0)	2021.12.28
[Numpy exercise 100] 61번 ~ 70번 문제풀이 (0)	2021.12.27
[Numpy exercise 100] 31번 ~ 60번 문제풀이 (0)	2021.12.24
[Numpy exercise 100] 1번 ~ 30번 문제풀이 (0)	2021.12.23

문제 원본

91. How to create a record array from a regular array? (★★★)

92. Consider a large vector Z, compute Z to the power of 3 using 3 different methods (★★★)

93. Consider two arrays A and B of shape (8,3) and (2,2). How to find rows of A that contain elements of each row of B regardless of the order of the elements in B? (★★★)

코드 설명

94. Considering a 10x3 matrix, extract rows with unequal values (e.g. [2,2,3]) (★★★)

95. Convert a vector of ints into a matrix binary representation (★★★)

코드 설명 (첫번째 코드)

96. Given a two dimensional array, how to extract unique rows? (★★★)

97. Considering 2 vectors A & B, write the einsum equivalent of inner, outer, sum, and mul function (★★★)

98. Considering a path described by two vectors (X,Y), how to sample it using equidistant samples (★★★)?

99. Given an integer n and a 2D array X, select from X the rows which can be interpreted as draws from a multinomial distribution with n degrees, i.e., the rows which only contain integers and which sum to n. (★★★)

100. Compute bootstrapped 95% confidence intervals for the mean of a 1D array X (i.e., resample the elements of an array with replacement N times, compute the mean of each sample, and then compute percentiles over the means). (★★★)

'Minding's Programming > Numpy & Pandas' 카테고리의 다른 글

티스토리툴바