ChatGPT解决这个技术问题 Extra ChatGPT

Can you explain the concept of streams?

I understand that a stream is a representation of a sequence of bytes. Each stream provides means for reading and writing bytes to its given backing store. But what is the point of the stream? Why isn't the backing store itself what we interact with?

For whatever reason this concept just isn't clicking for me. I've read a bunch of articles, but I think I need an analogy or something.


H
Hosam Aly

The word "stream" has been chosen because it represents (in real life) a very similar meaning to what we want to convey when we use it.

Let's forget about the backing store for a little, and start thinking about the analogy to a water stream. You receive a continuous flow of data, just like water continuously flows in a river. You don't necessarily know where the data is coming from, and most often you don't need to; be it from a file, a socket, or any other source, it doesn't (shouldn't) really matter. This is very similar to receiving a stream of water, whereby you don't need to know where it is coming from; be it from a lake, a fountain, or any other source, it doesn't (shouldn't) really matter.

That said, once you start thinking that you only care about getting the data you need, regardless of where it comes from, the abstractions other people talked about become clearer. You start thinking that you can wrap streams, and your methods will still work perfectly. For example, you could do this:

int ReadInt(StreamReader reader) { return Int32.Parse(reader.ReadLine()); }

// in another method:
Stream fileStream = new FileStream("My Data.dat");
Stream zipStream = new ZipDecompressorStream(fileStream);
Stream decryptedStream = new DecryptionStream(zipStream);
StreamReader reader = new StreamReader(decryptedStream);

int x = ReadInt(reader);

As you see, it becomes very easy to change your input source without changing your processing logic. For example, to read your data from a network socket instead of a file:

Stream stream = new NetworkStream(mySocket);
StreamReader reader = new StreamReader(stream);
int x = ReadInt(reader);

As easy as it can be. And the beauty continues, as you can use any kind of input source, as long as you can build a stream "wrapper" for it. You could even do this:

public class RandomNumbersStreamReader : StreamReader {
    private Random random = new Random();

    public String ReadLine() { return random.Next().ToString(); }
}

// and to call it:
int x = ReadInt(new RandomNumbersStreamReader());

See? As long as your method doesn't care what the input source is, you can customize your source in various ways. The abstraction allows you to decouple input from processing logic in a very elegant way.

Note that the stream we created ourselves does not have a backing store, but it still serves our purposes perfectly.

So, to summarize, a stream is just a source of input, hiding away (abstracting) another source. As long as you don't break the abstraction, your code will be very flexible.


Abstract thinking (and explaining) seems to be in your blood ;) Your analogy to water (and thus metaphorical references) reminded me of Omar Khayyam.
@HosamAly Your explanation is very clear but something confuse me a bit in the sample code. The explicit conversion from string to int is done automatically doing ReadInt ? i believe i could do ReadString too ?
@Rushino There are no conversions in the code above. The method ReadInt is defined at the very top using int.Parse, which receives the string returned from reader.ReadLine() and parses it. Of course you could create a similar ReadString method. Is this clear enough?
Well put. Streams to me are the most simple and powerful generic abstractions in the entirety of programming. Having .net basic Stream.Copy makes life so much easier in a lot of applications.
J
Jon Skeet

The point is that you shouldn't have to know what the backing store is - it's an abstraction over it. Indeed, there might not even be a backing store - you could be reading from a network, and the data is never "stored" at all.

If you can write code that works whether you're talking to a file system, memory, a network or anything else which supports the stream idea, your code is a lot more flexible.

In addition, streams are often chained together - you can have a stream which compresses whatever is put into it, writing the compressed form on to another stream, or one which encrypts the data, etc. At the other end there'd be the reverse chain, decrypting, decompressing or whatever.


Don't the different types of stream readers used in @HosamAly example above imply that you do know what the backing store is? I take it FileStream, NetworkStream etc... are reading from those types of sources. Additionally, are there cases where you don't know what the backing store might be and that would be dynamically chosen while the program runs? I just haven't personally come across this and would like to know more.
Also, can streams pipe data through some process as data is generated or do I need access to the full dataset I want to operate on when I begin the process?
@user137717: No, if you just take a StreamReader - or better, a TextReader then your code doesn't know what kind of stream underlies the data flow. Or rather, it can use the BaseStream property to find out the type - but it may be a type that your code has never seen before. The point is that you shouldn't care. And yes, you can absolutely end up writing code which will sometimes be used for a network stream and sometimes be used for a file stream. As for streams piping data through a process - well that wouldn't be done inside the process... it would be the stream provider.
T
Torlack

The point of the stream is to provide a layer of abstraction between you and the backing store. Thus a given block of code that uses a stream need not care if the backing store is a disk file, memory, etc...


Yeah, it allows you to interchange the type of stream without breaking your code. For example, you could read in from a file on one call and then a memory buffer on the next.
I would add that the reason you would want to do this is that often you don't need file seek capability when reading or writing a file, and thus if you use a stream that same code can easily be used to read from or write to a network socket, for example.
d
dmajkic

It's not about streams - it's about swimming. If you can swim one Stream, than you can swim any Stream you encounter.


O
OwenP

To add to the echo chamber, the stream is an abstraction so you don't care about the underlying store. It makes the most sense when you consider scenarios with and without streams.

Files are uninteresting for the most part because streams don't do much above and beyond what non-stream-based methods I'm familiar with did. Let's start with internet files.

If I want to download a file from the internet, I have to open a TCP socket, make a connection, and receive bytes until there are no more bytes. I have to manage a buffer, know the size of the expected file, and write code to detect when the connection is dropped and handle this appropriately.

Let's say I have some sort of TcpDataStream object. I create it with the appropriate connection information, then read bytes from the stream until it says there aren't any more bytes. The stream handles the buffer management, end-of-data conditions, and connection management.

In this way, streams make I/O easier. You could certainly write a TcpFileDownloader class that does what the stream does, but then you have a class that's specific to TCP. Most stream interfaces simply provide a Read() and Write() method, and any more complicated concepts are handled by the internal implementation. Because of this, you can use the same basic code to read or write to memory, disk files, sockets, and many other data stores.


佚名

The visualisation I use is conveyor belts, not in real factories because I don't know anything about that, but in cartoon factories where items move along lines and get stamped and boxed and counted and checked by a sequence of dumb devices.

You have simple components that do one thing, for example a device to put a cherry on a cake. This device has an input stream of cherryless cakes, and an output stream of cakes with cherries. There are three advantages worth mentioning structuring your processing in this way.

Firstly it simplifies the components themselves: if you want to put chocolate icing on a cake, you don't need a complicated device that knows everything about cakes, you can create a dumb device that sticks chocolate icing onto whatever is fed into it (in the cartoons, this goes as far as not knowing that the next item in isn't a cake, it's Wile E. Coyote).

Secondly you can create different products by putting the devices into different sequences: maybe you want your cakes to have icing on top of the cherry instead of cherry on top of the icing, and you can do that simply by swapping the devices around on the line.

Thirdly, the devices don't need to manage inventory, boxing, or unboxing. The most efficient way of aggregating and packaging things is changeable: maybe today you're putting your cakes into boxes of 48 and sending them out by the truckload, but tomorrow you want to send out boxes of six in response to custom orders. This kind of change can be accommodated by replacing or reconfiguring the machines at the start and end of the production line; the cherry machine in the middle of the line doesn't have to be changed to process a different number of items at a time, it always works with one item at a time and it doesn't have to know how its input or output is being grouped.


Great example of analogy-as-explanation.
J
Julian

When I heard about streaming for the first time, it was in the context of live streaming with a webcam. So, one host is broadcasting video content, and the other host is receiving the video content. So is this streaming? Well... yes... but a live stream is a concrete concept, and I think that the question refers to the abstract concept of Streaming. See https://en.wikipedia.org/wiki/Live_streaming

So let's move on.

Video is not the only resource that can be streamed. Audio can be streamed too. So we are talking about Streaming media now. See https://en.wikipedia.org/wiki/Streaming_media . Audio can be delivered from source to target in numerous of ways. So let's compare some data delivery methods to each other.

Classic file downloading Classic file downloading doesn't happen real-time. Before taking the file to use, you'll have to wait until the download is complete.

Progressive download Progressive download chunks download data from the streamed media file to a temporary buffer. Data in that buffer is workable: audio-video data in the buffer is playable. Because of that users can watch / listen to the streamed media file while downloading. Fast-forwarding and rewinding is possible, offcourse withing the buffer. Anyway, progressive download is not live streaming.

Streaming Happens real-time, and chunks data. Streaming is implemented in live broadcasts. Clients listening to the broadcast can't fast-forwarding or rewind. In video streams, data is discarded after playback.

A Streaming Server keeps a 2-way connection with its client, while a Web Server closes connection after a server response.

Audio and video are not the only thing that can be streamed. Let's have a look at the concept of streams in the PHP manual.

a stream is a resource object which exhibits streamable behavior. That is, it can be read from or written to in a linear fashion, and may be able to fseek() to an arbitrary location within the stream. Link: https://www.php.net/manual/en/intro.stream.php

In PHP, a resource is a reference to an external source like a file, database connection. So in other words, a stream is a source that can be read from or written to. So, If you worked with fopen(), then you already worked with streams.

An example of a Text-file that is subjected to Streaming:

// Let's say that cheese.txt is a file that contains this content: 
// I like cheese, a lot! My favorite cheese brand is Leerdammer.
$fp = fopen('cheese.txt', 'r');

$str8 = fread($fp, 8); // read first 8 characters from stream. 

fseek($fp, 21); // set position indicator from stream at the 21th position (0 = first position)
$str30 = fread($fp, 30); // read 30 characters from stream

echo $str8; // Output: I like c 
echo $str30; // Output: My favorite cheese brand is L

Zip files can be streamed too. On top of that, streaming is not limited to files. HTTP, FTP, SSH connections and Input/Output can be streamed as well.

What does wikipedia say about the concept of Streaming?

In computer science, a stream is a sequence of data elements made available over time. A stream can be thought of as items on a conveyor belt being processed one at a time rather than in large batches.

See: https://en.wikipedia.org/wiki/Stream_%28computing%29 .

Wikipedia links to this: https://srfi.schemers.org/srfi-41/srfi-41.html and the writers have this to say about streams:

Streams, sometimes called lazy lists, are a sequential data structure containing elements computed only on demand. A stream is either null or is a pair with a stream in its cdr. Since elements of a stream are computed only when accessed, streams can be infinite.

So a Stream is actually a data structure.

My conclusion: a stream is a source that can contains data that can be read from or written to in a sequential way. A stream does not read everything that the source contains at once, it reads/writes sequentially.

Usefull links:

http://www.slideshare.net/auroraeosrose/writing-and-using-php-streams-and-sockets-zendcon-2011 Provides a very clear presentation https://www.sk89q.com/2010/04/introduction-to-php-streams/ http://www.netlingo.com/word/stream-or-streaming.php http://www.brainbell.com/tutorials/php/Using_PHP_Streams.htm http://www.sitepoint.com/php-streaming-output-buffering-explained/ http://php.net/manual/en/wrappers.php http://www.digidata-lb.com/streaming/Streaming_Proposal.pdf http://www.webopedia.com/TERM/S/streaming.html https://en.wikipedia.org/wiki/Stream_%28computing%29 https://srfi.schemers.org/srfi-41/srfi-41.html


v
vava

It's just a concept, another level of abstraction that makes your life easier. And they all have common interface which means you can combine them in a pipe like manner. For example, encode to base64, then zip and then write this to disk and all in one line!


That's useful, certainly, but I wouldn't say it's the "whole point". Even without chaining it's useful to have a common abstraction.
Yeah, you're right. I've change the words to make this clear.
Yup, that's better. Hope you didn't think I was being too picky!
K
Ken

The best explanation of streams I've seen is chapter 3 of SICP. (You may need to read the first 2 chapters for it to make sense, but you should anyway. :-)

They don't use sterams for bytes at all, but rather integers. The big points that I got from it were:

Streams are delayed lists

The computational overhead [of eagerly computing everything ahead of time, in some cases] is outrageous

We can use streams to represent sequences that are infinitely long


I'm actually currently on chapter 1 of SICP. Thanks!
one would like to tell SICP stream from others. an important feature of SICP stream is laziness, while the generic stream concept emphasizes the abstraction on data sequences.
v
vikyd

Another point (For reading file situation):

stream can allow you to do something else before finished reading all content of the file. you can save memory, because do not need to load all file content at once.


A
Anton Gogolev

Think of streams as of an abstract source of data (bytes, characters, etc.). They abstract actual mechanics of reading from and writing to the concrete datasource, be it a network socket, file on a disk or a response from the web server.


J
Julian Birch

I think you need to consider that the backing store itself is often just another abstraction. A memory stream is pretty easy to understand, but a file is radically different depending on which file system you're using, never mind what hard drive you are using. Not all streams do in fact sit on top of a backing store: network streams pretty much just are streams.

The point of a stream is that we restrict our attention to what is important. By having a standard abstraction, we can perform common operations. Even if you don't want to, for instance, search a file or an HTTP response for URLs today, doesn't mean you won't wish to tomorrow.

Streams were originally conceived when memory was tiny compared to storage. Just reading a C file could be a significant load. Minimizing the memory footprint was extremely important. Hence, an abstraction in which very little needed to be loaded was very useful. Today, it is equally useful when performing network communication and, it turns out, rarely that restrictive when we deal with files. The ability to transparently add things like buffering in a general fashion makes it even more useful.


S
Sean

A stream is an abstracting of a sequence of bytes. The idea is that you don't need to know where the bytes come from, just that you can read them in a standardized manner.

For example, if you process data via a stream then it doesn't matter to your code if the data comes from a file, a network connection, a string, a blob in a database etc etc etc.

There's nothing wrong per-se with interacting with the backing store itself except for the fact that it ties you to the backing store implementation.


J
Jeff Yates

A stream is an abstraction that provides a standard set of methods and properties for interacting with data. By abstracting away from the actual storage medium, your code can be written without total reliance on what that medium is or even the implementation of that medium.

An good analogy might be to consider a bag. You don't care what a bag is made of or what it does when you put your stuff in it, as long as the bag performs the job of being a bag and you can get your stuff back out. A stream defines for storage media what the concept of bag defines for different instances of a bag (such as trash bag, handbag, rucksack, etc.) - the rules of interaction.


M
Marcus

I'll keep it short, I was just missing the word here:

Streams are queues usually stored in buffer containing any kind of data.

(Now, since we all know what queues are, there's no need to explain this any further.)