코딩 공부/Pandas 2024. 2. 6. 23:48

For Uploading a Blog

Importing and exporting data (2024-02-06)¶

CSV¶

In [ ]:

df = pd.DataFrame(np.random.randint(0,5,(10,5)))
df.to_csv("foo.csv")

In [ ]:

pd.read_csv("foo.csv")

	Unnamed: 0	0	1	2	3	4
0	0	2	3	0	2	2
1	1	1	3	4	3	1
2	2	0	4	1	0	2
3	3	2	2	3	3	1
4	4	1	4	0	3	2
5	5	0	4	0	2	2
6	6	3	4	1	3	3
7	7	2	2	4	2	0
8	8	1	0	3	1	0
9	9	2	3	1	2	2

주의 할점:
CSV 형식으로 부터 읽어올 때 주의할 점은 기존 행 인덱스를 인식하지 못하고 행 인덱스를 가지는 새로운 열이 추가로 잡힌다는 것입니다.
따라서 저장할 당시에는 4개였던 열의 개수가 5개가 되어있는 것을 확인할 수 있습니다.
이를 해결하기 위해서 저장할 떄 index_rabel=False를 해주면 제외하고 저장할 수 있다 아니면 불러올 때 index_col=0를 통해 첫번째 열을 인덱스로 사용하여 저장된 인덱스를 그냥 바로 사용하도록 설정해줄 수도 있다

Parquet¶

In [ ]:

df.to_parquet("foo.parquet")

In [ ]:

pd.read_parquet("foo.parquet")

	0	1	2	3	4
0	2	3	0	2	2
1	1	3	4	3	1
2	0	4	1	0	2
3	2	2	3	3	1
4	1	4	0	3	2
5	0	4	0	2	2
6	3	4	1	3	3
7	2	2	4	2	0
8	1	0	3	1	0
9	2	3	1	2	2

HDF5¶

In [ ]:

# df.to_hdf('foo.h5', 'df') #모듈 추가 설치해야하는듯한데 쓸일없을듯하여 패스하였다

---------------------------------------------------------------------------

ModuleNotFoundError                       Traceback (most recent call last)

File c:\Users\kssg1\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\compat\_optional.py:142, in import_optional_dependency(name, extra, errors, min_version)

    141 try:

--> 142     module = importlib.import_module(name)

    143 except ImportError:



File c:\Users\kssg1\AppData\Local\Programs\Python\Python38\lib\importlib\__init__.py:127, in import_module(name, package)

    126         level += 1

--> 127 return _bootstrap._gcd_import(name[level:], package, level)



File <frozen importlib._bootstrap>:1014, in _gcd_import(name, package, level)



File <frozen importlib._bootstrap>:991, in _find_and_load(name, import_)



File <frozen importlib._bootstrap>:973, in _find_and_load_unlocked(name, import_)



ModuleNotFoundError: No module named 'tables'



During handling of the above exception, another exception occurred:



ImportError                               Traceback (most recent call last)

Cell In[115], line 1

----> 1 df.to_hdf('foo.h5', 'df')



File c:\Users\kssg1\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\generic.py:2682, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)

   2678 from pandas.io import pytables

   2680 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected

   2681 # "Union[DataFrame, Series]" [arg-type]

-> 2682 pytables.to_hdf(

   2683     path_or_buf,

   2684     key,

   2685     self,  # type: ignore[arg-type]

   2686     mode=mode,

   2687     complevel=complevel,

   2688     complib=complib,

   2689     append=append,

   2690     format=format,

   2691     index=index,

   2692     min_itemsize=min_itemsize,

   2693     nan_rep=nan_rep,

   2694     dropna=dropna,

   2695     data_columns=data_columns,

   2696     errors=errors,

   2697     encoding=encoding,

   2698 )



File c:\Users\kssg1\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\pytables.py:302, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding)

    300 path_or_buf = stringify_path(path_or_buf)

    301 if isinstance(path_or_buf, str):

--> 302     with HDFStore(

    303         path_or_buf, mode=mode, complevel=complevel, complib=complib

    304     ) as store:

    305         f(store)

    306 else:



File c:\Users\kssg1\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\pytables.py:560, in HDFStore.__init__(self, path, mode, complevel, complib, fletcher32, **kwargs)

    557 if "format" in kwargs:

    558     raise ValueError("format is not a defined argument for HDFStore")

--> 560 tables = import_optional_dependency("tables")

    562 if complib is not None and complib not in tables.filters.all_complibs:

    563     raise ValueError(

    564         f"complib only supports {tables.filters.all_complibs} compression."

    565     )



File c:\Users\kssg1\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\compat\_optional.py:145, in import_optional_dependency(name, extra, errors, min_version)

    143 except ImportError:

    144     if errors == "raise":

--> 145         raise ImportError(msg)

    146     return None

    148 # Handle submodules: if we have submodule, grab parent module from sys.modules



ImportError: Missing optional dependency 'pytables'.  Use pip or conda to install pytables.

그냥 실행시 실패 추가로 모듈 설치해줘야하는 듯¶

(그래서 pytalbes라는 것을 설치해야 쓸 수 있을 듯한데 당장 쓸일없을듯하여 패스하였다)

{ "name": "ImportError", "message": "Missing optional dependency 'pytables'. Use pip or conda to install pytables.", "stack": "--------------------------------------------------------------------------- ModuleNotFoundError Traceback (most recent call last) File c:\Users\kssg1\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\compat\_optional.py:142, in import_optional_dependency(name, extra, errors, min_version) 141 try: --> 142 module = importlib.import_module(name) 143 except ImportError:

File c:\Users\kssg1\AppData\Local\Programs\Python\Python38\lib\importlib\init.py:127, in import_module(name, package) 126 level += 1 --> 127 return _bootstrap._gcd_import(name[level:], package, level)

File :1014, in _gcd_import(name, package, level)

File :991, in find_and_load(name, import)

File :973, in find_and_load_unlocked(name, import)

ModuleNotFoundError: No module named 'tables'

During handling of the above exception, another exception occurred:

ImportError Traceback (most recent call last) Cell In[115], line 1 ----> 1 df.to_hdf('foo.h5', 'df')

File c:\Users\kssg1\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\generic.py:2682, in NDFrame.to_hdf(self, path_or_buf, key, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding) 2678 from pandas.io import pytables 2680 # Argument 3 to "to_hdf" has incompatible type "NDFrame"; expected 2681 # "Union[DataFrame, Series]" [arg-type] -> 2682 pytables.to_hdf( 2683 path_or_buf, 2684 key, 2685 self, # type: ignore[arg-type] 2686 mode=mode, 2687 complevel=complevel, 2688 complib=complib, 2689 append=append, 2690 format=format, 2691 index=index, 2692 min_itemsize=min_itemsize, 2693 nan_rep=nan_rep, 2694 dropna=dropna, 2695 data_columns=data_columns, 2696 errors=errors, 2697 encoding=encoding, 2698 )

File c:\Users\kssg1\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\pytables.py:302, in to_hdf(path_or_buf, key, value, mode, complevel, complib, append, format, index, min_itemsize, nan_rep, dropna, data_columns, errors, encoding) 300 path_or_buf = stringify_path(path_or_buf) 301 if isinstance(path_or_buf, str): --> 302 with HDFStore( 303 path_or_buf, mode=mode, complevel=complevel, complib=complib 304 ) as store: 305 f(store) 306 else:

File c:\Users\kssg1\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\io\pytables.py:560, in HDFStore.init(self, path, mode, complevel, complib, fletcher32, **kwargs) 557 if "format" in kwargs: 558 raise ValueError("format is not a defined argument for HDFStore") --> 560 tables = import_optional_dependency("tables") 562 if complib is not None and complib not in tables.filters.all_complibs: 563 raise ValueError( 564 f"complib only supports {tables.filters.all_complibs} compression." 565 )

File c:\Users\kssg1\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\compat\_optional.py:145, in import_optional_dependency(name, extra, errors, min_version) 143 except ImportError: 144 if errors == "raise": --> 145 raise ImportError(msg) 146 return None 148 # Handle submodules: if we have submodule, grab parent module from sys.modules

ImportError: Missing optional dependency 'pytables'. Use pip or conda to install pytables." }

생소한 파일 형식자들에 대한 gpt 답변 (+pickle)¶

https://chat.openai.com/share/ee9a3e69-bbd0-44c0-8e95-bf87a586f75c

Excel¶

In [ ]:

df.to_excel("foo.xlsx", sheet_name="Sheet1")

In [ ]:

pd.read_excel("foo.xlsx", "Sheet1", index_col=None, na_values=["NA"])

	Unnamed: 0	0	1	2	3	4
0	0	2	3	0	2	2
1	1	1	3	4	3	1
2	2	0	4	1	0	2
3	3	2	2	3	3	1
4	4	1	4	0	3	2
5	5	0	4	0	2	2
6	6	3	4	1	3	3
7	7	2	2	4	2	0
8	8	1	0	3	1	0
9	9	2	3	1	2	2

In [ ]:

import matplotlib.pyplot as plt
plt.close("all") # 왜 이것부터 시작한지는 모르겠지만 열려있는 모든 figure창을 닫아주는 메소드인듯하다

In [ ]:

ts = pd.Series(np.random.randn(1000), index=pd.date_range("1/1/2000", periods=1000))
ts = ts.cumsum()
ts.plot()

<Axes: >

No description has been provided for this image

cumsum은 누적합계를 해주는 메소드인듯하다
https://pandas.pydata.org/docs/reference/api/pandas.Series.cumsum.html

In [ ]:

df = pd.DataFrame(
    np.random.randn(1000,4), index=ts.index, columns=["A", "B", "C", "D" ]
)
df = df.cumsum()
plt.figure();
df.plot();
plt.legend(loc='best');

<Figure size 640x480 with 0 Axes>

파이썬에서느 일반적으로 세미콜론 안 적는 듯한데 왜 적어준지 모르겠음
https://chat.openai.com/share/4ae0e28e-30a6-4f04-bf75-6eb8dfc91ec1

'코딩 공부 > Pandas' 카테고리의 다른 글

Pandas와 친해지기(10분 Pandas) (2024-02-08) (2)	2024.02.08
Pandas와 친해지기(10분 Pandas) (2024-02-07) (1)	2024.02.07
Pandas와 친해지기(10분 Pandas) (2024-02-05) (0)	2024.02.05
Pandas와 친해지기(10분 Pandas) (2024-02-04) (0)	2024.02.04
Pandas와 친해지기(10분 Pandas) (2024-02-03) (0)	2024.02.03

ABOUT ME

kyeob 개발일지 kyeob 개발일지

Importing and exporting data (2024-02-06)¶

CSV¶

Parquet¶

HDF5¶

그냥 실행시 실패 추가로 모듈 설치해줘야하는 듯¶

생소한 파일 형식자들에 대한 gpt 답변 (+pickle)¶

Excel¶

'코딩 공부 > Pandas' 카테고리의 다른 글

티스토리툴바

	Unnamed: 0	0	1	2	3	4
0	0	2	3	0	2	2
1	1	1	3	4	3	1
2	2	0	4	1	0	2
3	3	2	2	3	3	1
4	4	1	4	0	3	2
5	5	0	4	0	2	2
6	6	3	4	1	3	3
7	7	2	2	4	2	0
8	8	1	0	3	1	0
9	9	2	3	1	2	2

	0	1	2	3	4
0	2	3	0	2	2
1	1	3	4	3	1
2	0	4	1	0	2
3	2	2	3	3	1
4	1	4	0	3	2
5	0	4	0	2	2
6	3	4	1	3	3
7	2	2	4	2	0
8	1	0	3	1	0
9	2	3	1	2	2

	Unnamed: 0	0	1	2	3	4
0	0	2	3	0	2	2
1	1	1	3	4	3	1
2	2	0	4	1	0	2
3	3	2	2	3	3	1
4	4	1	4	0	3	2
5	5	0	4	0	2	2
6	6	3	4	1	3	3
7	7	2	2	4	2	0
8	8	1	0	3	1	0
9	9	2	3	1	2	2

	Unnamed: 0	0	1	2	3	4
0	0	2	3	0	2	2
1	1	1	3	4	3	1
2	2	0	4	1	0	2
3	3	2	2	3	3	1
4	4	1	4	0	3	2
5	5	0	4	0	2	2
6	6	3	4	1	3	3
7	7	2	2	4	2	0
8	8	1	0	3	1	0
9	9	2	3	1	2	2

	0	1	2	3	4
0	2	3	0	2	2
1	1	3	4	3	1
2	0	4	1	0	2
3	2	2	3	3	1
4	1	4	0	3	2
5	0	4	0	2	2
6	3	4	1	3	3
7	2	2	4	2	0
8	1	0	3	1	0
9	2	3	1	2	2

	Unnamed: 0	0	1	2	3	4
0	0	2	3	0	2	2
1	1	1	3	4	3	1
2	2	0	4	1	0	2
3	3	2	2	3	3	1
4	4	1	4	0	3	2
5	5	0	4	0	2	2
6	6	3	4	1	3	3
7	7	2	2	4	2	0
8	8	1	0	3	1	0
9	9	2	3	1	2	2

ABOUT ME

Importing and exporting data (2024-02-06)¶

CSV¶

Parquet¶

HDF5¶

그냥 실행시 실패 추가로 모듈 설치해줘야하는 듯¶

생소한 파일 형식자들에 대한 gpt 답변 (+pickle)¶

Excel¶

'코딩 공부 > Pandas' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바

	Unnamed: 0	0	1	2	3	4
0	0	2	3	0	2	2
1	1	1	3	4	3	1
2	2	0	4	1	0	2
3	3	2	2	3	3	1
4	4	1	4	0	3	2
5	5	0	4	0	2	2
6	6	3	4	1	3	3
7	7	2	2	4	2	0
8	8	1	0	3	1	0
9	9	2	3	1	2	2

	0	1	2	3	4
0	2	3	0	2	2
1	1	3	4	3	1
2	0	4	1	0	2
3	2	2	3	3	1
4	1	4	0	3	2
5	0	4	0	2	2
6	3	4	1	3	3
7	2	2	4	2	0
8	1	0	3	1	0
9	2	3	1	2	2

	Unnamed: 0	0	1	2	3	4
0	0	2	3	0	2	2
1	1	1	3	4	3	1
2	2	0	4	1	0	2
3	3	2	2	3	3	1
4	4	1	4	0	3	2
5	5	0	4	0	2	2
6	6	3	4	1	3	3
7	7	2	2	4	2	0
8	8	1	0	3	1	0
9	9	2	3	1	2	2