Here’s an article on WebSocket output problem to Dataframe Pandas with Binance:
Problem: Endless loop of data output to Pandas Dataframe
Since you have successfully integrated your WebSocket with Binance in your script, it is necessary to address another common challenge that results from this integration. The problem is how data is collected and stored in the Pandas data.
When using API Websocket, such as Binance WebSockets, each message received by the client is usually stored as a separate element within the attribute of the “data” of an object returned by connecting WebSocket. This can lead to exponential data growth in your Pandas Dataframe, resulting in an endless loop of data output.
Why does it happen?
At Binance’s WebSockets API, messages are sent in pieces with time stamp and message content. When you sign up for multiple streams (eg at the cost of bitcoins and pair volumes), each flow receives its own separate set of messages. As the WebSocket connection operates indefinitely, it will continue to receive new messages from each stream, creating an endless loop.
Solution: Solution of infinite data output with panda
You can use several strategies to avoid this endless output of data and prevent the memory of your script:
1. Use Dask
Dask is a parallel computing library that allows you to expand the calculation of large data files without having to use a full cluster. By using Daska, you can divide a huge amount of data into smaller pieces and process them in parallel, reducing the use of memory.
`Python
import dask.dataframe as DD
Create an empty dataframe with 1,000 rows (a reasonable piece size)
d = dd.from_pandas (pd.dataframe ({‘price’: np.random.rand (1000)}), npartitions = 10)
Make data calculations in pieces of 100 rows simultaneously
d.compute ()
`
2. Use a cache numpy
If you are working with large binary data files, consider the use of access based on the Numpy Balancing Memory for more efficient storage and handling.
`Python
Import Numpy as NP
From io Import Bytesio
Create an empty list to rely on the data (as the cache numps)
data = []
Process each piece of data in the loop
For i in the range (1000):
Read 10000 bytes from WebSocket connection to the cache
Chunk = np.Fromffer (B’chunk_data ‘ * 10, Dtype = np.int32) .tobobobe ()
Connect a piece to the list (as a cache feet)
data.append (np.buffermanager (buffer = Bytesio (chunk))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
Combine cache into one data frame
df = pd.concat (data)
Now you can make calculations on the entire data file using DASK or Pandas
`
3. Use the streaming data processing library
There are libraries such as “Starlette” that provide the possibility of streaming data processing data for Binance WebSockets.
`Python
From Starlette Import, httpview
import asyncio
Class WebSocketProcessor (HTMLView):
Async Def Call (Self, request):
Get a message from the WebSocket connection
Message = Await Request.Json ()
Process the message and save it to the data frame (using a daska for efficient processing)
df = dd.from_pandas (pd.dataframe ({‘content’: [message [‘data’]}), npartitions = 10)
Make data calculations in parallel using DASK
Result = Await Dask.compute (DF) .Compute ()
Return Web.Json_RESPONSE (result)
Launch server to process incoming requirements
App = Web.Application ([WebSocketSprocessor])
Web.Run_App (App, Host = ‘0.0.0.0’, Port = 8000)
`
Conclusion
In conclusion, the problem of endless data output on Dataframe Pandas from Binance Websockets API can be solved by using strategies such as DASK or by using a cache cache numping numping of efficient processing and storage.