다운로드
작성자: admin 작성일시: 2016-07-08 14:00:04 조회수: 9858 다운로드: 283
카테고리: Python 태그목록:

Python과 엑셀 연동

In:
%mkdir -p ./data
%rm -rf ./data/*.xls
%rm -rf ./data/*.xlsx

Excel 연동을 위한 Python 패키지

  • xlwt
  • xlrd
  • xlsxwriter
  • OpenPyxl

xlwt/xlrd 패키지

  • Excel 쓰기 및 읽기
  • COM(Component Object Model)을 사용하지 않고 직접 Excel Format의 Binary Data를 생성
  • Excel이 설치되어 있기 않아도 사용 가능
  • Linux에서도 사용 가능
  • Excel 97 / 2000 / XP / 2003 포맷(.xls)만 지원
In:
import xlwt, xlrd

.xls Workbook 생성

In:
wb = xlwt.Workbook()
wb
Out:

Worksheet 생성

In:
ws_1 = wb.add_sheet('first_sheet', cell_overwrite_ok=True)
In:
ws_1 = wb.get_sheet(0)
ws_1
Out:
In:
ws_2 = wb.add_sheet('second_sheet')
ws_2
Out:
In:
data = np.arange(1, 65).reshape((8, 8))
data
Out:
array([[ 1,  2,  3,  4,  5,  6,  7,  8],
       [ 9, 10, 11, 12, 13, 14, 15, 16],
       [17, 18, 19, 20, 21, 22, 23, 24],
       [25, 26, 27, 28, 29, 30, 31, 32],
       [33, 34, 35, 36, 37, 38, 39, 40],
       [41, 42, 43, 44, 45, 46, 47, 48],
       [49, 50, 51, 52, 53, 54, 55, 56],
       [57, 58, 59, 60, 61, 62, 63, 64]])

Cell 쓰기

In:
ws_1.write(0, 0, 100)
In:
for c in range(data.shape[0]):
    for r in range(data.shape[1]):
        ws_1.write(r, c, data[c, r])
        ws_2.write(r, c, data[r, c])

.xls 파일 쓰기

In:
wb.save('./data/workbook.xls')
In:
%ls -Fcal data
total 16
drwxr-xr-x 2 dockeruser dockeruser 4096 Jul  8 04:45 ./
drwxr-xr-x 4 dockeruser dockeruser 4096 Jul  8 04:48 ../
-rw-r--r-- 1 dockeruser dockeruser 5632 Jul  8 04:48 workbook.xls

.xls 파일 읽기

In:
book = xlrd.open_workbook('data/workbook.xls')
book
Out:
In:
book.sheet_names()
Out:
[u'first_sheet', u'second_sheet']
In:
sheet_1 = book.sheet_by_name('first_sheet')
sheet_2 = book.sheet_by_index(1)
sheet_1, sheet_2
Out:
(, )
In:
sheet_2.name
Out:
u'second_sheet'
In:
sheet_1.ncols, sheet_1.nrows
Out:
(8, 8)

Sheet Method

  • cell()
  • row()
  • col()
  • row_values()
  • col_values()
In:
cl = sheet_1.cell(0, 0)
cl.value
Out:
1.0
In:
cl.ctype
Out:
2

cell type

유형 숫자 파이썬 타입
XL_CELL_EMPTY 0 Empty string 빈 문자열
XL_CELL_TEXT 1 A Unicode string 유니코드 문자열
XL_CELL_NUMBER 2 float 부동소수점
XL_CELL_DATE 3 float 부동소수점
XL_CELL_BOOLEAN 4 int (1=TRUE, 0=FALSE) 정수
XL_CELL_ERROR 5 int 내부 엑셀 코드를 표시하는 정수
XL_CELL_BLANK 6 빈 문자열 (formatting_info=True 인 경우)
In:
sheet_2.row(3)
Out:
[number:25.0,
 number:26.0,
 number:27.0,
 number:28.0,
 number:29.0,
 number:30.0,
 number:31.0,
 number:32.0]
In:
sheet_2.col(3)
Out:
[number:4.0,
 number:12.0,
 number:20.0,
 number:28.0,
 number:36.0,
 number:44.0,
 number:52.0,
 number:60.0]
In:
sheet_1.col_values(3, start_rowx=3, end_rowx=7)
Out:
[28.0, 29.0, 30.0, 31.0]
In:
for c in range(sheet_1.ncols):
    for r in range(sheet_1.nrows):
        print('%2i' % sheet_1.cell(r, c).value, end=",")
    print()
 1, 2, 3, 4, 5, 6, 7, 8,
 9,10,11,12,13,14,15,16,
17,18,19,20,21,22,23,24,
25,26,27,28,29,30,31,32,
33,34,35,36,37,38,39,40,
41,42,43,44,45,46,47,48,
49,50,51,52,53,54,55,56,
57,58,59,60,61,62,63,64,

xlsxwriter 패키지

  • http://xlsxwriter.readthedocs.org/
  • COM(Component Object Model)을 사용하지 않고 직접 Excel Format의 Binary Data를 생성
  • 최신 .xlsx 포맷도 지원
  • chart 생성 기능 지원
In:
import xlsxwriter

.xlsx Workbook 생성

In:
wb = xlsxwriter.Workbook('data/workbook.xlsx')
wb
Out:
In:
ws_1 = wb.add_worksheet('first_sheet')
ws_2 = wb.add_worksheet('second_sheet')
ws_1, ws_2
Out:
(,
 )
In:
for c in range(data.shape[0]):
    for r in range(data.shape[1]):
         ws_1.write(r, c, data[c, r])
         ws_2.write(r, c, data[r, c])
In:
wb.close()
In:
%ls -Fcal ./data
total 24
drwxr-xr-x 2 dockeruser dockeruser 4096 Jul  8 04:49 ./
drwxr-xr-x 4 dockeruser dockeruser 4096 Jul  8 04:49 ../
-rw-r--r-- 1 dockeruser dockeruser 5632 Jul  8 04:48 workbook.xls
-rw-r--r-- 1 dockeruser dockeruser 6050 Jul  8 04:49 workbook.xlsx

챠트 추가

In:
wb = xlsxwriter.Workbook('data/chart.xlsx')
ws = wb.add_worksheet()

# 첫번째 열에는 숫자의 누적합을 쓴다.
values = np.random.standard_normal(15).cumsum()
ws.write_column('A1', values)

# 차트 객체를 새로 생성한다.
chart = wb.add_chart({'type': 'line'})

# 차트에 계열을 추가한다.
chart.add_series({'values': '=Sheet1!$A$1:$A$15',
                  'marker': {'type': 'diamond'},})
# 계열 마크 설정 (다이아몬드)

# 차트 삽입
ws.insert_chart('C1', chart)
wb.close()
In:
%ls -Fcal data
total 32
drwxr-xr-x 2 dockeruser dockeruser 4096 Jul  8 04:50 ./
drwxr-xr-x 4 dockeruser dockeruser 4096 Jul  8 04:50 ../
-rw-r--r-- 1 dockeruser dockeruser 7375 Jul  8 04:50 chart.xlsx
-rw-r--r-- 1 dockeruser dockeruser 5632 Jul  8 04:48 workbook.xls
-rw-r--r-- 1 dockeruser dockeruser 6050 Jul  8 04:49 workbook.xlsx

openpyxl 패키지

In:
import openpyxl as oxl
In:
wb = oxl.Workbook()
wb
Out:
In:
ws = wb.create_sheet(index=0, title='oxl_sheet')
ws
Out:
In:
for c in range(data.shape[0]):
    for r in range(data.shape[1]):
        ws.cell(row=r+1, column=c+1).value = data[c, r]
        # 셀 객체를 생성하고 값을 할당한다.
In:
wb.save('./data/oxl_book.xlsx')
In:
%ls -Fcal ./data
total 40
drwxr-xr-x 2 dockeruser dockeruser 4096 Jul  8 04:53 ./
drwxr-xr-x 4 dockeruser dockeruser 4096 Jul  8 04:53 ../
-rw-r--r-- 1 dockeruser dockeruser 7375 Jul  8 04:50 chart.xlsx
-rw-r--r-- 1 dockeruser dockeruser 5782 Jul  8 04:53 oxl_book.xlsx
-rw-r--r-- 1 dockeruser dockeruser 5632 Jul  8 04:48 workbook.xls
-rw-r--r-- 1 dockeruser dockeruser 6050 Jul  8 04:49 workbook.xlsx
In:
wb = oxl.load_workbook('./data/oxl_book.xlsx')
wb
Out:
In:
ws = wb.active
ws
Out:
In:
cell = ws['B4']
cell.column, cell.row, cell.value
Out:
('B', 4, 12L)
In:
for cell in ws['B1':'B4']:
    print(cell[0].value)
9
10
11
12
In:
for row in ws.iter_rows('B1:C4'):
    for cell in row:
        print(cell.value, end=",")
9,17,10,18,11,19,12,20,

pandas를 사용한 Excel 읽고 쓰기

  • pandas 는 xlwt/xlrd 내장
  • read_excel()
  • ExcelWriter
In:
df_1 = pd.read_excel('data/workbook.xlsx', 'first_sheet', header=None)
df_1
Out:
0 1 2 3 4 5 6 7
0 1 9 17 25 33 41 49 57
1 2 10 18 26 34 42 50 58
2 3 11 19 27 35 43 51 59
3 4 12 20 28 36 44 52 60
4 5 13 21 29 37 45 53 61
5 6 14 22 30 38 46 54 62
6 7 15 23 31 39 47 55 63
7 8 16 24 32 40 48 56 64
In:
df_2 = pd.read_excel('data/workbook.xlsx', 'second_sheet', header=None)
df_2
Out:
0 1 2 3 4 5 6 7
0 1 2 3 4 5 6 7 8
1 9 10 11 12 13 14 15 16
2 17 18 19 20 21 22 23 24
3 25 26 27 28 29 30 31 32
4 33 34 35 36 37 38 39 40
5 41 42 43 44 45 46 47 48
6 49 50 51 52 53 54 55 56
7 57 58 59 60 61 62 63 64
In:
import string
columns = []
for c in range(data.shape[0]):
    columns.append(string.uppercase[c])
df_1.columns = columns
df_2.columns = columns
In:
df_1
Out:
A B C D E F G H
0 1 9 17 25 33 41 49 57
1 2 10 18 26 34 42 50 58
2 3 11 19 27 35 43 51 59
3 4 12 20 28 36 44 52 60
4 5 13 21 29 37 45 53 61
5 6 14 22 30 38 46 54 62
6 7 15 23 31 39 47 55 63
7 8 16 24 32 40 48 56 64
In:
df_2
Out:
A B C D E F G H
0 1 2 3 4 5 6 7 8
1 9 10 11 12 13 14 15 16
2 17 18 19 20 21 22 23 24
3 25 26 27 28 29 30 31 32
4 33 34 35 36 37 38 39 40
5 41 42 43 44 45 46 47 48
6 49 50 51 52 53 54 55 56
7 57 58 59 60 61 62 63 64
In:
df_1.to_excel('./data/new_book_1.xlsx', 'my_sheet')
In:
%ls -Fcal ./data
total 48
drwxr-xr-x 2 dockeruser dockeruser 4096 Jul  8 04:55 ./
drwxr-xr-x 4 dockeruser dockeruser 4096 Jul  8 04:55 ../
-rw-r--r-- 1 dockeruser dockeruser 7375 Jul  8 04:50 chart.xlsx
-rw-r--r-- 1 dockeruser dockeruser 5811 Jul  8 04:55 new_book_1.xlsx
-rw-r--r-- 1 dockeruser dockeruser 5782 Jul  8 04:53 oxl_book.xlsx
-rw-r--r-- 1 dockeruser dockeruser 5632 Jul  8 04:48 workbook.xls
-rw-r--r-- 1 dockeruser dockeruser 6050 Jul  8 04:49 workbook.xlsx
In:
wbn = xlrd.open_workbook('./data/new_book_1.xlsx')
wbn.sheet_names()
Out:
[u'my_sheet']
In:
wbw = pd.ExcelWriter('./data/new_book_2.xlsx')
df_1.to_excel(wbw, 'first_sheet')
df_2.to_excel(wbw, 'second_sheet')
wbw.save()
In:
%ls -Fcal ./data
total 56
drwxr-xr-x 2 dockeruser dockeruser 4096 Jul  8 04:56 ./
drwxr-xr-x 4 dockeruser dockeruser 4096 Jul  8 04:55 ../
-rw-r--r-- 1 dockeruser dockeruser 7375 Jul  8 04:50 chart.xlsx
-rw-r--r-- 1 dockeruser dockeruser 5811 Jul  8 04:55 new_book_1.xlsx
-rw-r--r-- 1 dockeruser dockeruser 6664 Jul  8 04:56 new_book_2.xlsx
-rw-r--r-- 1 dockeruser dockeruser 5782 Jul  8 04:53 oxl_book.xlsx
-rw-r--r-- 1 dockeruser dockeruser 5632 Jul  8 04:48 workbook.xls
-rw-r--r-- 1 dockeruser dockeruser 6050 Jul  8 04:49 workbook.xlsx
In:
wbn = xlrd.open_workbook('./data/new_book_2.xlsx')
wbn.sheet_names()
Out:
[u'first_sheet', u'second_sheet']
In:
data = np.random.rand(20, 100000)
data.nbytes
Out:
16000000
In:
df = pd.DataFrame(data)
df
Out:
0 1 2 3 4 5 6 7 8 9 ... 99990 99991 99992 99993 99994 99995 99996 99997 99998 99999
0 0.793100 0.290056 0.583396 0.966430 0.923651 0.322442 0.154318 0.265497 0.264346 0.104558 ... 0.936211 0.981311 0.055813 0.404973 0.682393 0.847683 0.604535 0.751707 0.318052 0.118904
1 0.592149 0.977925 0.487636 0.140595 0.579268 0.624574 0.386529 0.672645 0.888648 0.925261 ... 0.246856 0.311114 0.963610 0.450400 0.996286 0.664110 0.169628 0.868465 0.378138 0.559427
2 0.556800 0.875934 0.689262 0.289976 0.006450 0.996536 0.615216 0.465078 0.532218 0.737862 ... 0.431658 0.490750 0.088388 0.844733 0.770193 0.293460 0.778149 0.192933 0.337801 0.107539
3 0.020234 0.969383 0.193909 0.874451 0.451009 0.206470 0.661639 0.424887 0.006143 0.290917 ... 0.575013 0.918006 0.565021 0.030208 0.944737 0.644644 0.203202 0.240416 0.335166 0.564975
4 0.094824 0.314200 0.898915 0.017384 0.672363 0.047202 0.132170 0.273767 0.902077 0.181586 ... 0.927757 0.753796 0.505352 0.039284 0.959797 0.127906 0.212912 0.226325 0.390042 0.319557
5 0.249626 0.477221 0.187811 0.590910 0.228333 0.204200 0.210370 0.073378 0.601264 0.053763 ... 0.862836 0.158814 0.701950 0.560324 0.870025 0.815638 0.584403 0.498614 0.712390 0.536535
6 0.036739 0.987864 0.304772 0.054295 0.444393 0.368979 0.163225 0.954481 0.900881 0.307288 ... 0.212329 0.731505 0.510076 0.538380 0.531677 0.408143 0.356996 0.686571 0.703199 0.594908
7 0.155551 0.102987 0.791929 0.897123 0.634879 0.451910 0.702932 0.043877 0.529243 0.767344 ... 0.101194 0.989258 0.156463 0.389187 0.884330 0.686334 0.642736 0.147801 0.862026 0.528545
8 0.785275 0.608935 0.474743 0.708921 0.767318 0.194513 0.688725 0.237486 0.761624 0.949359 ... 0.448369 0.455015 0.187843 0.272060 0.437672 0.598499 0.754772 0.363502 0.443522 0.729368
9 0.661803 0.047524 0.259578 0.977401 0.315901 0.853187 0.216957 0.712201 0.393311 0.340896 ... 0.518859 0.901894 0.727146 0.981920 0.521907 0.576952 0.333319 0.366821 0.379062 0.592537
10 0.193371 0.755933 0.921434 0.022184 0.897012 0.278873 0.942517 0.545123 0.780738 0.967167 ... 0.965361 0.808522 0.021221 0.085114 0.575308 0.488282 0.714469 0.650973 0.790076 0.477692
11 0.780899 0.587837 0.180125 0.687044 0.492060 0.572677 0.985433 0.663325 0.080520 0.433118 ... 0.905863 0.553130 0.207558 0.521882 0.481516 0.759161 0.050874 0.338774 0.295547 0.180345
12 0.317469 0.267338 0.828032 0.112232 0.265220 0.728244 0.779422 0.649815 0.136367 0.035565 ... 0.283905 0.050874 0.744962 0.970457 0.776022 0.405116 0.183992 0.386503 0.265440 0.705505
13 0.573384 0.868479 0.973574 0.768788 0.964174 0.850699 0.496993 0.180656 0.025995 0.392899 ... 0.155902 0.105075 0.669590 0.467595 0.000023 0.461704 0.126067 0.249849 0.913891 0.052940
14 0.454735 0.901256 0.041397 0.331318 0.162939 0.761066 0.587393 0.552281 0.428264 0.182178 ... 0.639741 0.110956 0.320534 0.752522 0.336629 0.968374 0.596937 0.016608 0.308348 0.372741
15 0.039257 0.676711 0.800690 0.915406 0.142485 0.051275 0.830822 0.832145 0.980107 0.906851 ... 0.033934 0.358502 0.512860 0.262180 0.465864 0.950770 0.032259 0.591920 0.492434 0.572393
16 0.840164 0.967752 0.240112 0.007850 0.215525 0.281160 0.308040 0.400088 0.331418 0.257642 ... 0.077701 0.082775 0.870017 0.715198 0.295637 0.314729 0.872157 0.283856 0.523854 0.621991
17 0.434295 0.315422 0.411673 0.397073 0.953368 0.152800 0.192639 0.042227 0.606172 0.024539 ... 0.511029 0.032805 0.508156 0.845321 0.873530 0.846385 0.079159 0.169301 0.929852 0.279567
18 0.310509 0.753710 0.494310 0.288319 0.722850 0.600826 0.874846 0.314385 0.570024 0.538693 ... 0.739287 0.973737 0.230978 0.917690 0.440219 0.715325 0.711091 0.792424 0.453736 0.397569
19 0.590204 0.750580 0.443926 0.540892 0.296287 0.043844 0.603139 0.518104 0.146806 0.553369 ... 0.216828 0.112292 0.756146 0.893212 0.925478 0.307175 0.938525 0.312391 0.767320 0.633252

20 rows × 100000 columns

In:
%time df.to_excel('data/data.xlsx', 'data_sheet')
CPU times: user 25.8 s, sys: 0 ns, total: 25.8 s
Wall time: 26.1 s
In:
%time df = pd.read_excel('data/data.xlsx', 'data_sheet')
CPU times: user 3.38 s, sys: 0 ns, total: 3.38 s
Wall time: 3.38 s

기타 Excel 패키지

  • PyXLL
  • DataNitro
  • xlwings
  • COM(Component Object Model) 방식
  • 백그라운드로 Excel을 가동시키고 통신하는 방식
  • Windows OS 에서 Excel이 설치되어 있는 경우에만 가능
  • Interactive 동작, VBA script 연동 등이 가능
In:
%rm -rf ./data/*.xls
%rm -rf ./data/*.xlsx

질문/덧글

아직 질문이나 덧글이 없습니다. 첫번째 글을 남겨주세요!