python选几列

admin 2026年04月02日 00:23 105 0

在Python数据处理中，常使用pandas库选择指定列，通过df[['列名1','列名2']]可选取多列，返回DataFrame；df['列名']选取单列，返回Series，若需按条件筛选，可结合布尔索引，如df[df['列名']>值]，df.loc[:, '列名']或df.iloc[:, 列索引]支持按标签或位置选列，前者含末端，后者不含，灵活运用这些方法，能高效提取所需数据，适用于数据清洗、分析等场景。

Python数据列选择全攻略：从基础到进阶的精准提取方法

在数据处理与分析中,高效提取特定列是核心操作，Python的pandas库凭借其灵活的DataFrame结构，提供了多样化的列选择方案，无论是单列提取、多列筛选，还是基于条件或索引的精准定位，掌握这些技巧能显著提升数据处理效率，本文系统梳理pandas中列选择的方法，从基础操作到进阶技巧，助您应对各类数据场景。

准备工作：环境配置与示例数据构建

首先确保安装pandas库（`pip install pandas`），并创建示例DataFrame用于演示：

import pandas as pd
构建示例数据集
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'age': [25, 30, 35, 28, 32],
'gender': ['F', 'M', 'M', 'M', 'F'],
'salary': [50000, 60000, 70000, 55000, 65000],
'department': ['HR', 'IT', 'Finance', 'IT', 'HR']
}
df = pd.DataFrame(data)
print("原始DataFrame结构：")
print(df)

输出结果：

原始DataFrame结构：
      name  age gender  salary department
0    Alice   25      F   50000         HR
1      Bob   30      M   60000         IT
2  Charlie   35      M   70000     Finance
3    David   28      M   55000         IT
4      Eve   32      F   65000         HR

基础列选择：单列与多列提取

单列提取：直接通过列名访问

选择单列时,直接使用列名字符串访问，返回pandas Series对象：

# 提取'name'列
name_col = df['name']
print("单列提取结果：")
print(name_col)
print("\n数据类型：", type(name_col))

输出结果：

单列提取结果：
0      Alice
1        Bob
2    Charlie
3      David
4        Eve
Name: name, dtype: object
数据类型： <class 'pandas.core.series.Series'>

多列提取：列名列表与双层方括号

选择多列时,需将列名存入列表并通过**双层方括号**`[[]]`访问，返回DataFrame：

# 提取'name'和'salary'列
multi_cols = df[['name', 'salary']]
print("多列提取结果：")
print(multi_cols)
print("\n数据类型：", type(multi_cols))

输出结果：

多列提取结果：
      name  salary
0    Alice   50000
1      Bob   60000
2  Charlie   70000
3    David   55000
4      Eve   65000
数据类型： <class 'pandas.core.frame.DataFrame'>

智能列筛选：基于条件与特征

按列名特征筛选：模糊匹配与正则表达式

使用`filter()`方法可按列名模式高效筛选列：

# 筛选列名包含'dept'的列
dept_cols = df.filter(like='dept')
print("列名包含'dept'的列：")
print(dept_cols)
筛选列名以'sal'开头的列
sal_cols = df.filter(regex='^sal')
print("\n列名以'sal'开头的列：")
print(sal_cols)

输出结果：

列名包含'dept'的列： department 0 HR 1 IT 2 Finance 3 IT 4 HR

列名以'sal'开头的列： salary 0 50000 1 60000 2 70000 3 55000 4 65000

按数据类型筛选：精准定位列类型

通过`select_dtypes()`按数据类型选择列，支持包含/排除特定类型：

# 提取数值型列（int/float）
numeric_cols = df.select_dtypes(include=['int', 'float'])
print("数值型列：")
print(numeric_cols)
提取字符串型列（object）
str_cols = df.select_dtypes(include='object')
print("\n字符串型列：")
print(str_cols)

输出结果：

数值型列： age salary 0 25 50000 1 30 60000 2 35 70000 3 28 55000 4 32 65000

字符串型列： name gender department 0 Alice F HR 1 Bob M IT 2 Charlie M Finance 3 David M IT 4 Eve F HR

基于索引的列选择：位置与标签定位

按列位置选择：iloc索引器

使用`iloc`基于整数位置索引列（从0开始）：

# 提取第1列（索引0）
first_col = df.iloc[:, 0]
print("第1列（位置索引0）：")
print(first_col)
提取第2-4列（左闭右开区间）
cols_2_4 = df.iloc[:, 1:4]
print("\n第2-4列（位置索引1:4）：")
print(cols_2_4)

输出结果：

第1列（位置索引0）： 0 Alice 1 Bob 2 Charlie 3 David 4 Eve Name: name, dtype: object

第2-4列（位置索引1:4）： age gender salary 0 25 F 50000 1 30 M 60000 2 35 M 70000 3 28 M 55000 4 32 F 65000