Ethereum: Problem with websocket output into dataframe with pandas

Here’s an article on WebSocket output problem to Dataframe Pandas with Binance:

Problem: Endless loop of data output to Pandas Dataframe

Since you have successfully integrated your WebSocket with Binance in your script, it is necessary to address another common challenge that results from this integration. The problem is how data is collected and stored in the Pandas data.

When using API Websocket, such as Binance WebSockets, each message received by the client is usually stored as a separate element within the attribute of the “data” of an object returned by connecting WebSocket. This can lead to exponential data growth in your Pandas Dataframe, resulting in an endless loop of data output.

Why does it happen?

At Binance’s WebSockets API, messages are sent in pieces with time stamp and message content. When you sign up for multiple streams (eg at the cost of bitcoins and pair volumes), each flow receives its own separate set of messages. As the WebSocket connection operates indefinitely, it will continue to receive new messages from each stream, creating an endless loop.

Solution: Solution of infinite data output with panda

You can use several strategies to avoid this endless output of data and prevent the memory of your script:

1. Use Dask

Dask is a parallel computing library that allows you to expand the calculation of large data files without having to use a full cluster. By using Daska, you can divide a huge amount of data into smaller pieces and process them in parallel, reducing the use of memory.

`Python

import dask.dataframe as DD

Create an empty dataframe with 1,000 rows (a reasonable piece size)

d = dd.from_pandas (pd.dataframe ({‘price’: np.random.rand (1000)}), npartitions = 10)

Make data calculations in pieces of 100 rows simultaneously

d.compute ()

2. Use a cache numpy

If you are working with large binary data files, consider the use of access based on the Numpy Balancing Memory for more efficient storage and handling.

`Python

Import Numpy as NP

From io Import Bytesio

Create an empty list to rely on the data (as the cache numps)

data = []

Process each piece of data in the loop

For i in the range (1000):

Read 10000 bytes from WebSocket connection to the cache

Chunk = np.Fromffer (B’chunk_data ‘ * 10, Dtype = np.int32) .tobobobe ()

Connect a piece to the list (as a cache feet)

data.append (np.buffermanager (buffer = Bytesio (chunk))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))

Combine cache into one data frame

df = pd.concat (data)

Now you can make calculations on the entire data file using DASK or Pandas

3. Use the streaming data processing library

There are libraries such as “Starlette” that provide the possibility of streaming data processing data for Binance WebSockets.

`Python

From Starlette Import, httpview

import asyncio

Class WebSocketProcessor (HTMLView):

Async Def Call (Self, request):

Get a message from the WebSocket connection

Message = Await Request.Json ()

Process the message and save it to the data frame (using a daska for efficient processing)

df = dd.from_pandas (pd.dataframe ({‘content’: [message [‘data’]}), npartitions = 10)

Make data calculations in parallel using DASK

Result = Await Dask.compute (DF) .Compute ()

Return Web.Json_RESPONSE (result)

Launch server to process incoming requirements

App = Web.Application ([WebSocketSprocessor])

Web.Run_App (App, Host = ‘0.0.0.0’, Port = 8000)

Conclusion

In conclusion, the problem of endless data output on Dataframe Pandas from Binance Websockets API can be solved by using strategies such as DASK or by using a cache cache numping numping of efficient processing and storage.

Shaping Future Models

Leave a Comment Cancel Reply