Ways to create simulated numerical data based on small sample (using Python)?

Question

For an internship, I'm being asked to simulate the electrical consumption for virtual appliances (e.g. fridges, freezers).

The company currently has a bunch of recorded second by second data from several different appliances. The data displays strong patterns. It has only two variables - time and (W) consumed. I will need to simulate a large amount of data to train the algorithm. What are the best ways to do this in Python? I have seen some modules that have functionality in that vein but they don't seem suitable for my data format.

qmeeus · Answer 1 · 2018-08-22T13:21:05.850

1

You can find what you need in

numpy.random

For example, generating a 100x100 matrix with normal data:

data = numpy.random.normal(mean, sigma, shape=(100, 100))

see here

[EDIT]: If you want more advanced way of generating data similar to an existing dataset, you can look into GANs.

edited Aug 22 '18 at 13:21

answered Aug 22 '18 at 12:00

qmeeus

1,259
1
10
13

I'm aware of how to generate random numbers. My question was how to create a dataset that conforms to the patterns of the existing data. I'm currently looking at using LSTM Recurrent Neural Networks to do this. If you know any alternate methods for doing this please share them. – samp327 Aug 22 '18 at 12:23

score 1 · Answer 2 · answered Aug 22 '18 at 12:39

You can look into GANs with LSTM decoders. However without any additional prior information it will not improve your downstream task by a lot because you did not feed it any more information, which means it has to hallucinate more data.

Another approach is to think about the generating process. For example with a fridge, I imagine that there is not a lot of dependency on the time a day before. Maybe if it is less efficient (by looking at the mean). You could try to take segments of different appliances and mix and match them to simulate new samples. Here you introduce a prior bias by assuming some kind of independence.

Ways to create simulated numerical data based on small sample (using Python)?

2 Answers2