Technoglot

Sunday, November 24, 2013

Body Sensors -- What Your Body Tells You

     Our bodies have the capacity to tell us a lot about our health and what we need to do to feel our best. Alas, we also have the ability to ignore these signs and an aspect of our modern society is to encourage us to do so.
     When people think about "sensors" they usually think about electronic automation. There is a sensor to tell your thermostat whether to turn on the heater or air conditioner. There are sensors in ovens and toasters to indicate when proper temperature at appropriate times has been applied. There are sensors in our cars to indicate proper fuel intake and when to shift gears and even to apply brakes or throttle. We have similar things within our bodies. Much of robotics is concerned with getting machines to be able to do the same things we do every day.
     The first sensors that come to mind for people is our " five senses". These are usually listed as sight, hearing, touch, taste, and smell. Some people add a "sixth sense" to indicate information we take in that are not easily linked to the five physical senses.
     However, we also have a lot of internal sensors -- primarily associated with the way that our brains are able to interpret specific signals. We can tell, via pressure at specific points, whether we need to urinate or defecate. We can tell if our stomachs are adequately full. The sense of cold and heat can easily be fooled because it is associated with the way specific nerves under our skin react to temperature differences. Sometimes these interact with certain organs -- such as our inner ear -- to tell us whether we are level or spinning and help with the ability to move smoothly.
     The third category of sensors is difficult to understand fully. Our brains have access to much information that requires a host of tests to determine externally. They have access to insulin levels, to endocrine levels, to the amount of oxygen being carried by our blood, and to the levels of neurotransmitters and other chemicals in our brains and bodies. Much of the time, our bodies work with this data automatically by use of the "brain stem". However, it is possible for people to access this information consciously and actively apply responses.
     So, what does all of this information tell us? It tells us when we are hungry or full, whether we are hot or cold, and whether the food, drink, or other substance we might bring into our bodies is good for us or not. It tells us whether we need to use the restroom. It also tells us whether we are tired, sad, happy, stressed, excited, and just (all in all) how we feel.
     Consider now the various items that often exist in our homes, or in the supermarket/pharmacy, or being advertised as services for us. Many of these exist because we do not pay attention to the information our bodies tell us. Why not? The pressures of a time-obsessed society cause us to eat quickly (not giving time for us to listen to body signals), and schedules tell us when we can eat/drink and do other bodily functions. The allure of a "quick fix" stops us from adopting a lifestyle where we get the proper exercise and sleep. Calorie-dense food is easily available and our bodies did not develop to allow for such. Plus, we often feel that it is a "reward" to do things that our bodies do not want or need -- that extra large dessert or an "extra large drink".
     It isn't easy to change and our economic society does not encourage us to change. However, if we allow ourselves to listen to what our bodies tell us then we can be healthier and happier.

Thursday, July 11, 2013

BIG data and data mining

In my household, big data is most directly related to the piles of LEGOs (or LEGO-system building components) that my boys have scattered around the house. Needles in haystacks are more often used as examples. My library of books around the house would be another example. In each case, big data basically means a lot of data.

A lot of anything, of course, is subjective. There are thousands of pieces of straw in a haystack. There are a few thousand books around my house. My boys have a couple of thousands of LEGOs. However, in the world of business (and surveillance) big data usually refers to hundreds of thousands (or even millions) of records -- each of which may have many minutes (audio) or many members (items sold in purchase records or words in emails, for example). Big data is just a way to describe lots of data.

Data mining is the process of finding that special yellow 2 by 2 LEGO in the pile, or finding the needle in the haystack, or finding a specific audio record that talks about things that are considered suspicious or dangerous.

Data mining has three basic components -- collection, storage, and analysis. These are not necessarily discrete stages but we'll discuss them separately (calling out exceptions).

As evidenced by the physical examples at the beginning of this blog, big data has always existed. Consider the stacks of paper birth certificates, or other historical documents that exist and which may need, from time to time, to be searched. The ability to effectively handle, and use, big data has gotten much easier since electronic formats have become standard.

Collection. Collection usually occurs at the time of transmission (when the originated data is moved to a destination). This might be a phone call. It could be at a point-of-sale (POS) cash register after the order has been finalized. It might be the registration record for a class. Collection may either occur at the intended destination (the company invoice/purchase order database) or via interception. Interception is where collection occurs somewhere other than the intended destination -- "wire tapping", people looking over your shoulder when you enter your credit card security information, and so forth.

Collection can occur anonymously or personalized. Personalization basically means that the record is associated with a corporate or living entity. In the case of a sale at a grocery store, the data will be associated with that store (and, possibly, that cashier and cash register). If you use a credit/debit card or a store "club" card, then the data can (and probably will) be associated with the person in addition. Generally, anonymous collection is considered innocuous while personalized collection is not. This does not mean there are not "legitimate" (proper, honorable) reasons to collect personal data but it does mean that the person may have concerns as to the purpose and safety of the data.
Storage. This always occurs at some point. However, it may be transitory if the data are removed upon receipt and analysis. Consider a "normal" phone call. The audio message exists (and is stored) from the origination (talking) until the receiving person analyzes it. If the message is redirected (to voice mail, for example), intercepted, or copied, this may turn into a permanent record requiring long-term storage.

Transactions (purchases, registration, email correspondence) where the data needs to be used in the future are almost all "permanently" stored. Of course, they can still be deleted in the future -- but, without advance knowledge of when, or if, this will occur they must be considered permanent.
Analysis. This can occur during the process of collection or it may occur later (after storage). Anonymous data is often analyzed statistically. How many of product X were sold by store Y in city Z? How many of product X were sold in state B? How long is the average voice call within a state? Trends can be analyzed over time. Store Y in city X sold NN of product X at price B. They sold GG of product X at price C (can be used to determine overall profit using margin versus quantity sold). Product F sells very well during the time period D through G but not very well in period H through M (seasonal item to be stocked differently depending on time of year).

Analysis can also be personalized. Customer ABC buys a lot of product F. Product G is similar but there is a greater profit margin on G -- send Customer ABC coupons for product G to get them to start buying product G on a regular basis. Or Customer DEF only buys product F if the price is below $ZZ.ZZ. Customer BEF is now buying baby products -- notify baby supply companies of contact information.

Finally, analysis can be triggered. Surveillance can use trigger words, or sequences of words (either written or audio) to divert records to further analysis. If you start buying diabetic-related foods and medicines, the data CAN be forwarded to your insurance company (and yes -- if the data is associated with you, then they CAN find your insurance company).

Big data does not change the stages but it does change the methods. There will often be multiple layers of analysis so that each step reduces the number of records to be analyzed. Analysis upon collection will specifically affect the manner in which the data are sorted and stored. And so forth.

People usually don't object to anonymous statistical analysis. They may start feeling threatened with personalized statistical analysis although they may also benefit from the results.

They often will feel threatened with triggered analysis because their "private" data are being used without explicit permission and can be used to exploit the data in some way. In addition, triggers can lead to false conclusions quite easily (you were actually buying diabetes supplies for your great Aunt, you have been reading a book about bad thing XXX and were discussing it with a friend). Big data methods are particularly susceptible to false initial triggers (although, hopefully, further analysis will filter more appropriately).

Friday, June 14, 2013

The Minimum Maximum: what is a bottleneck?

When you have a system that is made of various parts, there will always be some part which limits the overall performance of the system. Within flow situations (water, gas, data, etc.) this is called a "bottleneck". However, this concept can be extended to many things that we encounter in life, so I am going to discuss a more general concept that I call the "Minimum Maximum".

For example, you see a really high-performance car on the road. However, the performance of this car is much worse than you know of its capability -- it takes several seconds to start after the light says to go, it cuts corners or moves across the dividing lines of the road, it's speed varies on a constant basis. The reality is that the performance of the car is based on the best ability of the car and the driver. If the car is great and the driver's ability is "poor to middlin" then the car will only be capable of being driven. The purchase of a car is based on capability to buy -- not capability to drive. So, the high-performance car is "wasted" with the not-so-good driver. The car can be called "overkill" because its features and performance cannot be used appropriately.

If a race-car driver is behind the wheel of a poor-performance car then she or he will only be able to drive that car to the best of its ability. It is optimum to "tune" the system. Good cars for good drivers and poor cars for poor drivers.

In the world of the Internet, this is more often called a bottleneck. Let's say that you have a broadband connection that can provide 30 Mega bits per second (Mbps). However, you have an older computer and it can only process data at a rate up to 10 Mbps. The ability to use data from the Internet will be limited to the 10 Mbps of the computer. Going the other direction, if you have a powerful comupter but your access to the Internet is limited to 128 kilobits per second (128 kpbs) then you might as well get a slower (and less expensive) connection.

This would also apply to general gaming (not multiplayer and not connected to the Internet. If your disk drive can access data at 5 Mbps and your processor can only display at a rate of 3 Mbps then your disk drive is faster than necessary.

The problem with many situations for optimizing (or tuning) is that it is done in a multipurpose way. In the first Internet case, replacing the computer (or upgrading it) would increase the overall performance The same holds true for the second example (upgrading the computer will raise overall performance.

I'm sure that you can think of many other instances. When my family goes off to Chuch, we often have to wait for the "slowest". -- that person is the Minimum Maximum.

Whenever two or more systems interact, the various parts will work at various speeds/capabilities. Similar to doing Least Common Denominator (LCD) problems in school, the "MinMax" is the way to dtermine what is slowing down YOUR system (and an indicator as to what might need to be upgraded first.
.

Thursday, June 6, 2013

The REAL food pyramids: sustaining the foundation

Several organizations around the world have attempted to create graphical representations of what we should eat -- sometimes called "food pyramids" (current USDA version is a "food plate"). At first, it looks like this should work pretty well since food percentages are roughly 65/25/15 (carbohydrates/fats/proteins). Unfortunately, when this translates to actual food, most food is composed of a combination of nutritional elements. All fats are not considered to be of the same benefit and the effect of carbohydrates varies immensely depending on included fiber and other nutritional building blocks. Thus, it is difficult to use a pyramid to represent food needs. People still will often think of these food pyramids.

However, there are true food pyramids based on the needs and abundance of life on the planet. These are sometimes called trophic pyramids or energy pyramids. At the foundation level of these pyramids exists life that uses the energy from the sun (directly or indirectly) to manufacture food and body. On the land, these organisms are broadly called plants. In the sea, they are broadly called plankton -- although phytoplankton are the specific ones which are able to perform photosynthesis (creation from light).

These foundation foods (or primary producers) are eaten by the "higher layers" of the ecological food pyramids. It is possible for any organism to make use of them directly. For example, whales may feed on krill which are considered to be plankton (although they, in turn, make use of phytoplankton). In general, the lowest level are directly consumed by the next most abundant form of life. In a food pyramid, the next "level" can be determined by either number or function. Another way of putting it would be to think of a cartoon depiction of a very small fish being eaten by a small fish eaten by a medium fish eaten by a giant fish.

The organisms of the next level are called primary consumers. Primary consumers eat primary producers. So, herbivores are a general class of primary consumers. Although we usually think of mammals as herbivores, insects may be herbivores and worms might be considered to be herbivores.

The following level, sometimes called secondary consumers, may be either omnivores or carnivores. That is, they may eat a combination of producers (plants/phytoplankton) and primary consumers (or other secondary consumers) -- or they may be strictly carnivores that eat only other consumers. The "highest" level (in terms of the pyramid) eats only consumers.

These concepts are discussed in various ways -- food chains, food webs, ecological chains, and so forth. In whatever way they are approached, there are producers, primary consumers, secondary consumers, and tertiary consumers. The primary producers are always at the base.

The base determines the overall capacity of the entire pyramid. Thus, it is enormously important to protect that base. In the sea and on land, the largest threat is pollution although global climate change will certainly affect it in various ways. Note, however, that pollution can be either an aspect of waste ("garbage", runoff from managed land, etc.) or deliberate (even if accidental) contamination by oil and chemical spills and use of various chemicals within the food and non-food production chains.

As omnivores, humans have the capacity to shift their herbivore/carnivore balance -- they can be primarily meat eaters or primarily plant eaters. A shift towards the lower levels can allow more food to be available for all.

Thursday, May 23, 2013

Simplified is not necessarily better -- the tyranny of the average

There is a strong tendency in our society to try to make things "simple". I don't know whether it is because we are so time rushed or because we have such an imbalanced system of education. Sometimes having it simplified works to a specific person's benefit -- sometimes it is quite unfavorable to a specific person.

Simplification is often linked closely with statistics. As Mark Twain is quoted as saying -- "there are liars, damned liars, and statisticians". Statistics are applied math so they should be accurate -- however, the actual use of the formulas and numbers are decided by people and people make decisions based on what they want to have be true (either consciously or subconsciously).

One example of this are the figures quoted by politicians during election time. "The average income tax went down during my administration". Hmmm. Well, this statement could be "true" if the average income also went down during the time. It could be "true" if the tax rates went down (this is the interpretation the politician probably hopes you will make). It could be "true" if tax rates went down for one segment of the people (usually the high-income group) but went up slightly for other segments (and this has been happening in the U.S. for the past 12 years or so).

Three different realities -- all "supported" by the same "facts".

One area that hits hard for me is the Body Mass Index (BMI) number. The BMI is a fast, easy, "simple" method to indicate whether a person is overweight. It is reasonably accurate (+/- 5% or so) for about eight out of ten people. Who are the other 20% of the population? They are people who are particularly tall (more than 15% above average) or short (20% or more less than average) or people who have large amounts of muscle tissue -- yes, the "fit" are most at risk from inaccuracies on the BMI.

All of this would be academic if it wasn't so easy to fit these simplistic numbers into other formulas -- such as actuarial tables used by insurance companies. So, if you happen to be a body builder then be prepared to pay more from private insurance companies (on life and health) for being "too fat". If you are very short, then you can be rather overweight and still have the insurance count in your favor. If you are very tall, however, then you need to be prepared to pay more once again. Oh -- and I forgot to add -- computerized dating systems tend to like to use BMI in their calculations so expect body builders to be matched up with others of ample dimensions.

The reason the BMI is used is that it is easy and cheap -- take your weight (in kg) and divide it by your height (in meters) squared and you have a handy, dandy, all-purpose number. In order to really find out an accurate number, it is necessary to find real body fat percentage. There are formulas that measure different areas on one's body and then use those in conjunction with weight to get a number that works for 95%+ of the population. But it takes more time and more time means fewer patients seen and that means smaller profits. It is also possible to do a submersion test (where your body is submerged into a tank of water to accurately determine volume) that is the most accurate way to measure density (weight divided by volume equals density).

These are two methods where the average can hurt those who don't fit. There are many others. But do your best to understand the implications of statistical statements -- it may not mean what it seems.

Wednesday, May 8, 2013

What makes Blu-Ray (TM) Blue?

In the last blog, I talked about the difference between continuous (analog) data and discrete (digital) data. One of the most popular, "hands-on", types of data that people use each day are that for audio and video.

In the case of audio, analog data are often considered to be the most "faithful" to the sound. Digital adherents say that the sound stays crisp and clear. They are actually both correct. When analog recordings are made they are able to reproduce all of the "between" sounds that are dropped during analog recordings. While one can debate as to whether it can be heard by most people, it does exist and, therefore, there may be a substantial difference even if only noticed by the subconscious.

Analog recordings primarily fall into two categories -- an engraved reproduction of the sound waves or a magnetic version. Each has the capability of continuous data recording. However, the use of such recordings requires destructive mechanical mechanisms to be "read" after being recorded. For the "engraved" (vinyl, records, wax cylinders -- yes all have been used) version this means a sharp object following the path of the engraving which will eventually start cause eroding the engraving and a deterioration of the sound. For magnetic versions, the media (tape usually) wears while being pulled and the magnetic material on the tape also gets worn by friction with the reading "head".

Thus, as time goes on and the recording gets used, the recording will get worse -- while, in general, a digital recording will stay the same. So the audiophiles and the digital adherents are both "right".

CHALLENGE: It should be possible to create a commercially viable analog recording medium that can be read non-destructively. With all the bright people and companies employing bright people this should be possible. Make it so!

As discussed in the previous blog, digital media (for audio and video especially) requires decisions as to how much data will be omitted. This is precision and sampling rate. For human speech, it is considered acceptable to take a sample 8000 times per second and the data can be recorded with the use of 8 binary units ("bits"). This means that a digital recording of human speed will require 480,000 bytes (8 bits), or 480 KB per minute of recording. In the case of "high fidelity" digital recordings of music, the sampling rate can be increased and the precision may also be increased. This ends up with a greater amount of data.

Currently, a popular way to record this data is on optical disks. The digital data are marked on the optical media with very, very small pits. A pit can be considered to be a "1" and a land (non-pit) can be considered to be a "0". Note that this is actually very similar to analog engravings except for the nature of the data. Please also note that the exact encoding is actually more complicated than I am saying -- check other sources for more precise descriptions.

A larger difference form analog data, however, is how the data are read once recorded. An optical disk makes use of a laser which can tell whether there is a pit or a land by the timing of the reflection from the medium. This reading is non-destructive and, as long as the optical medium is not otherwise damaged, should retain data unchanged for a long time.

We now enter into the third area of data recordings which is storage space. An audio CD makes use of a near-infrared laser (wavelength of 780 nm). This wavelength determines how dense the data can be placed on the optical medium. For an audio CD, using 780 nm wavelength lasers, about 737 MB (megabytes) of data can be stored in a single layer (a disk CAN have multiple layers with the laser reading separately from the different layers of the disk). Since this amount of data is considered to be around 80 minutes of music, we can see that, for an audio CD, each minute takes about 10 MB of data so the precision and sampling rate are much higher than considered acceptable for human speech -- greater "fidelity".

The wavelength of the laser determines density -- how "packed" the data can be. This limits the total amount of data in a predetermined physical size. One method of increasing the density is by decreasing the wavelength of the laser.

A Blu-Ray disc uses a "blue" (blue is officially considered to be 475 nm) laser with a wavelength of 405 nm. A single layer Blu-Ray disk can store about 25 gigabyte (GB) of data. For audio, this would be about 100 hours using the same encoding as CDs. DVDs use a wavelength of about 650 nm ("true" red)..

Sunday, April 21, 2013

Analog and digital data

I thought that I would talk about digital media -- CDs, DVDs, Blu-Ray, and so forth. But then I realized that I really needed to first talk about what digital media are -- and that, in turn, means that it is important to talk about analog.

Analog data are a reflection of events that occur on a continual basis. Such things include time, temperature, sound, moving images, water flowing, and so forth. An analog watch is known by its "face" -- where the "hands" are located to allow a person to interpret the data (information).

It would be completely possible for a watch to have a single hand. All the information is present in the hour hand. However, it is difficult to "read" (interpret) the value with a single hand and, therefore, analog watches and clocks normally have a minute hand and may even have a second hand.

In my old university, they had an analog computer. Set up correctly, it would be able to be used to calculate an exact value for Pi. But this brings up further the problems with analog data -- being able to actually make use of the data in a precise manner.An analog thermometer can give a precise value but can a person really read it that clearly? Is it saying 98.6 or saying 98.53?

Digital data can only create approximations of continual information. There are a lot of non-continual data in the world -- particularly in the area of finances. However, when it comes to continual data, you are involved with sampling rates and precision. The sampling rate is how often you "mark down" the information. You take a sample of sound at 1 hour, 20 minutes, 15 seconds, and 180 milliseconds. You then take a sample of the sound every 20 milliseconds -- but, whatever interval you choose, you are also choosing to ignore the data that exists when you are not taking a sample. You will never really know what happened within that 20 millisecond gap. You can guess what it might be -- that is called interpolation -- but you cannot know.

The second part of digital data is precision. For money (or other non-continual data), the precision is self-defined by what exists (although other units may exist for formulas -- like taxes). For continual data, the precision is a choice. Do you record 98.5, 98.54, 98.536, 98.5359, or what? Once again, you lose data/information and your choice CAN make a difference if the data are used in a repetitive fashion (such as calculating trajectories for a space ship).

So, analog data are accurate but very difficult to interpret precisely. Digital data are an approximation but have ease of interpretation as a built-in aspect of the choices that are made.

And this leads into the next blog "What makes Blu-Ray (TM) Blue?"