Questions tagged [python]

Use for data science questions related to the programming language Python. Not intended for general coding questions (which should be asked on Stack Overflow).

Python is a general-purpose, dynamic, strongly typed language with many 3rd-party libraries for data science applications. There are two versions currently in wide use: 2 and 3. Python 2 is the "old" version, with no new versions being released beyond 2.7, save bugfixes. Python 3 is the "new" version, with active development.

Python syntax is relatively easy to comprehend compared to other languages. For example:

numbers = [1, 2, 5, 8, 9]
for number in numbers:
    print("Hello world #", number)

Python has a clean look due to its regulatory approach to whitespace. While seemingly restrictive, it allows all Python code to look similar, which makes inspecting code much more predictable. All loops and conditionals (for, while, if, etc) must be indented for the code block that follows.

Popular scientific and data science packages include:

  • Numpy - A fast, N-dimensional array library; the foundation for all things scientific Python.
  • Scipy - Numerical analysis built on Numpy. Allows for optimization, linear algebra, Fourier Transforms and much else.
  • Pandas (PANel DAta) - A fast and extremely flexible package that is very useful for data exploration. It handles NaN data well as well as fast indexing. Handles a wide variety of external data types and file formats.
6693 questions
12
votes
2 answers

Python Machine Learning/Data Science Project Structure

I'm looking for information on how should a Python Machine Learning project be organized. For Python usual projects there is Cookiecutter and for R ProjectTemplate. This is my current folder structure, but I'm mixing Jupyter Notebooks with actual…
David Gasquez
  • 221
  • 2
  • 6
7
votes
3 answers

Are there any projects to work on the Internet to fight against COVID?

Are there any open source projects that a novice data analyst and mathematician can do, to fight against covid-19 epidemic? I mean, I know that the best I can do is to stay away from people and now I have a laptop and plenty of time to work on some…
7
votes
3 answers

How to convert nested list into a single list in python?

I have a list that contain id number. Some elements of the list is an another list. To convert nested list into a single list, I write a recursive function using collections module. My codes are following. from collections import Iterable def…
Younus Ali
  • 79
  • 1
  • 4
6
votes
2 answers

Compare image similarity in Python

I'm using a dataset of movies and would like to group if a movie is the same across different retailers. Example: Movie: Beauty and the Beast Platforms: Google, Netflix, iTunes, Amazon. I have access to signals like: Studio, Movie Name, Runtime,…
gogasca
  • 749
  • 2
  • 8
  • 17
5
votes
2 answers

How to sort list by parameter in python?

I have a list of employee records. Each tuple of the list represent a person's record, which includes his name, ID, and age. For example, emp_records = [('Karim',100, 45), ('Rahim',10, 30),('Salim', 300,60),('Abu',50,35)] Now, I want to sort…
Reja
  • 898
  • 1
  • 9
  • 21
4
votes
2 answers

IPython notebook shortcut to run

I am currious if there is a shortcut for runing the selected cell in IPython notebook such as alt+F5 or ctrl+f5 or something simmiliar that is present in Visual Studio (I don't recall exactly now ) , cause I found it annoing to go with the mouse…
4
votes
1 answer

How to create an array from the list of arrays in python

I was trying to write a python code that can set some neural network channels or neurons to zero at the inference; and I wrote the code below. The code generates 10 different arrays for different percentage of the channels or neurons that are set to…
3
votes
4 answers

LabelEncoding selected columns in a Dataframe using for loop

I have certain columns in my dataset that are "object" type, so first I found them and now I want to transform from categorical to numerical data. How can I do it in multiple columns using a for loop? I've been struggling with it. I don't want to…
3
votes
3 answers

Approaches to pre-processing the huge but organised text data, with & without the generators

I've a huge text file, hence I'm reading it line-by-line, applying some basic cleaning, and separately writing the X & Y to 2 different csv files. Further I'm preparing 3 directories for each csv - train, val & test and writing each line as a…
3
votes
2 answers

read csv file directly from URL / How to Fix a 403 Forbidden Error

The csv file is downloadable. I can download the file and use read_csv, But I want to read the file via direct URL in jupyter, I used the following code, but I get the HTTP 403 Forbidden error from io import StringIO import pandas as pd import…
KHAN irfan
  • 421
  • 1
  • 7
  • 16
3
votes
1 answer

Replace a value in a column if that value appears only once

I have a dataframe, I want to replace the values in one column by "other" if the value count of that value in that column is exactly 1 i Food_group 0 Flake 1 Flake 2 Flake 3 Almond 4 Drink 5 …
KHAN irfan
  • 421
  • 1
  • 7
  • 16
3
votes
1 answer

Querying DBpedia from Python

How can I get information about an entity from DBpedia using Python? Eg: I need to get all DBpedia information about USA. So I need to write the query from python (SPARQL) and need to get all attributes on USA as result. I tried : PREFIX db:…
Sreejithc321
  • 1,920
  • 3
  • 18
  • 33
3
votes
0 answers

Grid search prediction results with best parameters on training set

i want to get prediction results on svm with best parameters but i didn't find way to get it. How to get prediction results on K fold? from __future__ import print_function from sklearn import datasets from sklearn.model_selection import…
Rawia Sammout
  • 197
  • 1
  • 3
  • 16
3
votes
2 answers

Ways to create simulated numerical data based on small sample (using Python)?

For an internship, I'm being asked to simulate the electrical consumption for virtual appliances (e.g. fridges, freezers). The company currently has a bunch of recorded second by second data from several different appliances. The data displays…
samp327
  • 31
  • 1
3
votes
2 answers

How can I plot line plots based on an input python dataframe?

I need help to create a plot using 3 different columns from a dataframe. my dataframe looks like this: index CMPGN_NM COST_SUM SUMRY_DT 2 GSA_SMB_SMB_Generic_BMM 8985 2018-05-17 3 GSA_SMB_SMB_Generic_BMM 7456 2018-05-18 4…
1
2 3
10 11