Questions tagged [pandas]

pandas is a python library for Panel Data manipulation and analysis, e.g. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance.

pandas is a python library for PAN-el DA-ta manipulation and analysis, i.e. multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance. pandas is implemented primarily using numpy and Cython; it is intended to be able to integrate very easily with other numpy-based scientific libraries, such as statsmodels.

Main Features:

  • Data structures: for 1, 2, and 3 dimensional labeled data sets (respectively Series, DataFrames and Panels). Some of their main features include:
    • Automatically aligning data and interpolation
    • Handling missing observations in calculations
    • Convenient slicing and reshaping ("reindexing") functions
    • Categorical data types
    • Provide 'group by' aggregation or transformation functionality
    • Tools for merging / joining together data sets
    • Simple matplotlib integration for plotting
  • Date tools: objects for expressing date offsets or generating date ranges; some functionality similar to scikits.timeseries. Dates can be aligned to a specific timezone and converted / compared at-will
  • Statistical models: convenient ordinary least squares and panel OLS implementations for in-sample or rolling time series / cross-sectional regressions. These will hopefully be the starting point for implementing other models
  • Intelligent Cython offloading; complex computations are performed rapidly due to these optimizations.
  • Static and moving statistical tools: mean, standard deviation, correlation, covariance
  • Rich User Documentation, using Sphinx

Resources:

Books:

1340 questions
77
votes
4 answers

Convert a list of lists into a Pandas Dataframe

I am trying to convert a list of lists which looks like the following into a Pandas Dataframe [['New York Yankees ', '"Acevedo Juan" ', 900000, ' Pitcher\n'], ['New York Yankees ', '"Anderson Jason"', 300000, ' Pitcher\n'], ['New York Yankees ',…
Aravind Veluchamy
  • 871
  • 1
  • 6
  • 3
31
votes
6 answers

How to fill missing value based on other columns in Pandas dataframe?

Suppose I have a 5*3 data frame in which third column contains missing value 1 2 3 4 5 NaN 7 8 9 3 2 NaN 5 6 NaN I hope to generate value for missing value based rule that first product second column 1 2 3 4 5 20 <--4*5 7 8 9 3 2 6 <-- 3*2 5 6 30…
KyL
  • 429
  • 1
  • 4
  • 5
15
votes
4 answers

Pandas: how can I create multi-level columns

I have a pandas DataFrame which has the following columns: n_0 n_1 p_0 p_1 e_0 e_1 I want to transform it to have columns and sub-columns: 0 n p e 1 n p e I've searched in the documentation, and I'm completely lost on how…
Michael Hooreman
  • 793
  • 2
  • 9
  • 21
14
votes
3 answers

How do I merge two data frames in Python Pandas?

I have two data frames df1 and df2 and I would like to merge them into a single data frame. It is as if df1 and df2 were created by splitting a single data frame down the center vertically, like tearing a piece of paper that contains a list in half…
sebastianspiegel
  • 891
  • 4
  • 11
  • 16
3
votes
1 answer

Pandas: saving timedelta to Parquet

Looks like Pandas doesn't translate Pandas timedelta to Parquet INTERVAL: >>> import pandas as pd >>> df = pd.DataFrame([{'seconds': 30}]) >>> df.to_parquet('/tmp/test.parquet') # so far so good >>> df['duration'] = pd.to_timedelta(df.seconds,…
Yaniv Aknin
  • 143
  • 5
3
votes
3 answers

In a pandas dataframe can I convert my column values into numbers?

So I am new to all this. I was wondering in pandas can I convert my column values into numbers? I'll try and give an example to explain what I mean So say for example I have a column called, 'animals', in this column I have six different animals but…
Kio
  • 31
  • 1
3
votes
1 answer

Pandas: How can I merge two dataframes?

I found (How do I merge two data frames in Python Pandas?), but do not get the expected result. I have these two CSV files: # f1.csv num ano 76971 1975 76969 1975 76968 1975 76966 1975 76964 1975 76963 1975 76960 1975 and # f2.csv num …
britodfbr
  • 163
  • 1
  • 4
3
votes
2 answers

Is there a way in pandas to import NA fields as a string rather than NaN?

I'm doing a Kaggle challenge, and a lot of entries in the data are NA. However, according to the data description, this doesn't actually mean "missing data", it means something like "Not applicable", in the sense of it just not having that quality…
GrundleMoof
  • 311
  • 2
  • 4
  • 7
3
votes
2 answers

How to convert mixed datetime formats into single one in pandas?

I am working with DataFrame which contains multiple datetime formats in one column. For example: 2020-11-09 00:00:48 2020-11-09 00:00:48 2020-11-09 00:00:48 2020-11-09 00:00:48 2020-11-09…
2
votes
4 answers

including Zeros counts categories with pandas value_counts()

What i want is including zero counts categories while generating frequencies for categorical variables example: df = pd.DataFrame({ 'col1': ['a', 'c', 'a', 'e'], 'col2': ['x', 'y', 'y', 'z'], }) col1 col2 a x c y a y e…
Espoir
  • 51
  • 1
  • 5
2
votes
1 answer

Pandas pivot table, creating ad hoc columns per dimension values

I'm new to pivot tables and have the following dataset: mydict = {'City' : ['Lexington', 'Lexington', 'Louisville', 'Hartford', 'Portland', 'Dallas'], 'State': ['KY', 'KY', 'KY', 'CT', 'ME', 'TX'], 'Zip': ['38293', '38293',…
David
  • 121
  • 2
2
votes
2 answers

Series data structure in pandas

In the overview page of the pandas documentation the Series data structure is described as 'homogeneously-typed'. Data Structures Dimensions Name Description 1 Series 1D labeled homogeneously-typed array 2 DataFrame General 2D labeled,…
Indika K
  • 123
  • 1
  • 4
2
votes
0 answers

pandas dataframes memory

I have a question about memory usage. I want to do 4 things: 1) make a dataframe from one of several columns from a datasource, say a json string 2) make the third column of the original dataset the index to the dataframe 3) change the name of…
user3659451
  • 171
  • 5
2
votes
1 answer

How to calculate the difference based on matching criteria

Hello, I am trying to transfer over from excel to pandas. I want to add new column called 'daily_volume' where if the 'project_name' is equal to the above row project_name then calculate the difference. For example, 1,424.53 - 1,343.68 = 80.85 My…
Tom
  • 21
  • 1
1
vote
2 answers

Handling conflicting cases pandas python

I have a data set where some rows are same but belong to different classes. Example - index Heading 1 Heading 2 Heading 1b Heading 2b Class/Target row -1 a b c d 0 row -2 t r f k 0 row -3 m u p l 0 row -4 a b c d 1 row…
Abc1729
  • 15
  • 4
1
2 3 4