Any Data Scientists or data analysts here?

EternalBlizzard · Jan 29, 2017

Crow said:
hi i'm a data analyst, who woulda thunk

i work with qlikview and R professionally

Is Python still considered a good language in this field or does R dominates now or maybe it depends on the type of work? Like the last time i checked there was all this python vs R hype on the net for data science. I think more and more people are flocking to R

Crow · Jan 29, 2017

EternalBlizzard said:
Is Python still considered a good language in this field or does R dominates now or maybe it depends on the type of work? Like the last time i checked there was all this python vs R hype on the net for data science. I think more and more people are flocking to R

R is taught in most data science courses so people end up using R more, but i know of consulting companies that still use python, so.

bababango · Jan 30, 2017

as far as i see the data science is going to be the hotest in upcoming years atleast next 10 years reasonably as per the facts and current working going through this emerging field and indians are already far ahead in this field i.e called analytics or a sub branch of data science,

most practical usage currently and in future are as follows which might help full

1, as crow said , used for machine learning --> Ai domain usage of domain will be imaginably in each and every field to come in upcoming next 10 years insha Allah

2, Health analytics --- data ware housing , AI , research , commercial usage
i belongs to this field now a days

2a, Meteorology or earth sciences data readings , analytics and conclusions , like weather prediction etc.

3, social media data science ware housing , storage getting more and more high density content reached to 8k compare it to time when high definitions movies of 720p or around 600 mb and now the blue rays movies are reaching to around 10 gbs

4, Financial analytics ( my previous field ) banking specially its used to predict P&L , trends , risk indicators

5, Scientific application and researches. ( very hot subject ) and usually you can find in western universities very much these days.

R language is the one which most data analyst use to process huge data analysis along with other languages , other options are olap cubes ,

hope it may be beneficial for some although very limited and basic coverage of data science

rgds

r3aper · Jan 30, 2017

Crow said:
@r3aper, if you want a job as a data analyst, and you know even a rudimentary bit about data analytics, feel free to send me your CV. my team has everyone from CS, engineering, to economics and finance grads.

Hah who would have thought that this post will raise you from the dead

I am very sure, your inbox would be flooded with CVs by now. Thats a thing I too learned the hard way.

If you ask me, I don't know much about data analytics besides MS excel, but I have taken up a course on coursera and it will give me a intermediate understanding about data science, including working with large data sets, cleaning it up making use of it to extract reproducible information all along with a good understanding of R programming, Markdown, R Mardown along with a crash course on Statistical Inference (an elective course I took in my bachelors). Though I haven't been able to pick up the pace of learning but within this month (feb) I plan on understanding R at an intermediate level.

I am very much looking forward to find a job which gives me practical ground where I can learn about data science. I am actually a finance grad/masters and I would be very much willing to speak with you personally and discuss a few things as I have a lot to ask.

Crow · Feb 1, 2017

r3aper said:
Hah who would have thought that this post will raise you from the dead

haha total coincidence tbh

i responded to your PM but i'm not sure if you got it. can't see it in my sent items.

Edifier · Feb 2, 2017

r3aper said:
If you ask me, I don't know much about data analytics besides MS excel, but I have taken up a course on coursera and it will give me a intermediate understanding about data science, including working with large data sets, cleaning it up making use of it to extract reproducible information all along with a good understanding of R programming.

If you don't mind. Could you share the course you have taken up on coursera?

Thanks.

r3aper · Feb 2, 2017

Edifier said:
If you don't mind. Could you share the course you have taken up on coursera?

Thanks.

Its called the DATA SCIENTIST TOOLBOX

rafay22 · Feb 7, 2017

I really get confused in these fields. Everyone mixes up bigdata, data scientist and data analyst. Aren't they all different? Where does hadoop comes? Isnt it a file distribution/storage type of thing that distributes noSql data across nodes for faster data? Why bigdata needs to be unstructured always? Which field do I start first? Who is the guy that solves problems by looking at data? How a person reads bigdata from his bare eyes if it is unstructured its a mess, they are digits, where do you start solving problems?
Maybe im a mess but I have so many questions here. 5 months back when i wanted to pursue my FYP in bigdata, no one was there, so many jahils in universities and suddenly everyone pops. [MENTION=48]Crow[/MENTION] [MENTION=14308]sunny945[/MENTION]
Any help/direction would be appreciated. Where do I go to get answers to my questions? Apparently google confuses me A Lot in this.

Crow · Feb 7, 2017

1. Everyone mixes up bigdata, data scientist and data analyst. Aren't they all different?
No. It's all the same field. Business Analytics, Business Intelligence, Data Analytics, Big Data, ETL, Data Engineering; different names for the same thing more or less.

2. Where does hadoop comes? Isnt it a file distribution/storage type of thing that distributes noSql data across nodes for faster data?
Yes, that's pretty much it, it's useful for managing larger data sets.

3. Why bigdata needs to be unstructured always?
Semantics. Big data is any data that is *drumroll* big.

4. Who is the guy that solves problems by looking at data?
Usually in most organizations these two roles are separate. IT people are not responsible for decision-making by looking at data. That is the job for strategy/business development sort of departments.

5. How a person reads bigdata from his bare eyes if it is unstructured its a mess, they are digits, where do you start solving problems?
????????? what ????????? what
[MENTION=29074]rafay22[/MENTION]

sunny945 · Feb 7, 2017

rafay22 said:
I really get confused in these fields. Everyone mixes up bigdata, data scientist and data analyst. Aren't they all different? Where does hadoop comes? Isnt it a file distribution/storage type of thing that distributes noSql data across nodes for faster data? Why bigdata needs to be unstructured always? Which field do I start first? Who is the guy that solves problems by looking at data? How a person reads bigdata from his bare eyes if it is unstructured its a mess, they are digits, where do you start solving problems?
Maybe im a mess but I have so many questions here. 5 months back when i wanted to pursue my FYP in bigdata, no one was there, so many jahils in universities and suddenly everyone pops. @Crow @sunny945
Any help/direction would be appreciated. Where do I go to get answers to my questions? Apparently google confuses me A Lot in this.

I am just a computer Science student still i want to answer your questions as far as i know about it.

Look basically in very simple words if i try to explain things to you remember there are some fields in CS that are interconnected with each other.

Now regarding this field first of all the main field is Data Mining.

So the question is :

From where the data mining is coming and how can it solve the problems?

before i tell you how can it solve the problem first you need to know how we setup it:

1: We have huge huge data to take care of

2: The data is usually in textual format as roughly 90% data around the world right now is in textual format

3: We need a way to save the data into the systems and usually its not stored in 1 system but multiple computers connected together, these computers are connected together in a distributed computer system that usually uses different kinds of file system that support Distributed data means a file system that allow us to distribute our data and save it on multiple computers
connected together and in case we need the data that file system should have a good technique to retrieve the data.

4: for this purpose the best and widely used DFS is Hadoop that is also called HDFS. we have multiple machines that are connected together that uses hadoop to store data across multiple
nodes/computers.

Now after our data is saved what we need to do is lets say if we want to solve a problem, lets say a business problem like one of the most common such problem is Market basket analysis in which we determine the trend of what people purchases like we can say 90% people who purchases bread will also purchase Jam. this data analysis is only possible if our data is saved in a data center.

Another Question here:

Why Save data in Data center ?

Ans: Data is very huge and can't access as it is

So our data is save in Data Center and ready to process and to achieve our goals we use Data Mining algorithms in the backend.

Now Question is

What is Big Data?

Ans: Bigdata is just a term that we use for data that are so large and complex and we can't save that sort of data in traditional systems.

In implementation of Big Data we need:

Basic Framework that is usually DFS like Hadoop Mapreduce, GFS etc
For Retrieving and storing data we usually use NOSQL like MongoDB / Cassandra etc
For Data Processing we apply different sort of algorithms according to our requirements on Hadoop like K-mean, Apriori, Pagerank, Knn,C4.5 etc according to our requirements

Now this whole setup explained above is BIG Data.
Persons who setup and process algorithms and analyse data are Data Scientists and analyst.

Now question arises from where can i start ?

Ans: Data Mining because when you study data mining you will read about the concepts of
Data warehouse
Data Mining
Text Mining
Classification
Stats
And many basic things that are always require if you want to go into this field. then after you learn the basics you will yourself get idea about these big things but if you start from BIGData you will end up know just those useless things that are used for Marketing and nothing else.

Now i tried to explain each and everything as far as i know about it but still as i said i am just a CS Student.

rafay22 · Feb 7, 2017

Crow said:
5. How a person reads bigdata from his bare eyes if it is unstructured its a mess, they are digits, where do you start solving problems?
????????? what ????????? what
[MENTION=29074]rafay22[/MENTION]

Hahaha, i kind of expected that. I meant k like bigData is/can be a raw file for data being generated by a sensor or maybe any transaction detail. The question is HOW do you interpret the data you have by (maybe) reading it. Find all the attributes associated with it? Because the one I saw were too complex to even look at.

Sent from my 831C using Tapatalk

sunny945 · Feb 7, 2017

rafay22 said:
Hahaha, i kind of expected that. I meant k like bigData is/can be a raw file for data being generated by a sensor or maybe any transaction detail. The question is HOW do you interpret the data you have by (maybe) reading it. Find all the attributes associated with it? Because the one I saw were too complex to even look at.

Sent from my 831C using Tapatalk

you mean complex ? try to read about Google Big Table then you will forget about about the complexities of other data that you have saw.
google is using a very complex system to store their data but its very very effective .

rafay22 · Feb 7, 2017

[MENTION=14308]sunny945[/MENTION] thanks a ton for taking so much time. So if I want to get started in this field, i should take it from Data Mining (from coursera?) Rather than Data Science (was doing it from bigdatauniversity.com) as it was less technical and basic, it got me frustrating.

Sent from my 831C using Tapatalk

Crow · Feb 7, 2017

rafay22 said:
Hahaha, i kind of expected that. I meant k like bigData is/can be a raw file for data being generated by a sensor or maybe any transaction detail. The question is HOW do you interpret the data you have by (maybe) reading it. Find all the attributes associated with it? Because the one I saw were too complex to even look at.

This question is too vast to address in a pithy post on a gaming forum, BUT

The study and process of extracting information out of data is commonly known as data mining.

I think you're investing your faculties in the wrong place, for example if you don't have a sound basis for what ETL is and what basic data modeling (cleansing, manipulation) looks like, there's no point in reading up on Big Data. Start with small-scale BI stuff, Cognos, Power BI, etc. Branch out into R. Then find your way towards Hadoop / Big Data.

rafay22 · Feb 7, 2017

Crow said:
This question is too vast to address in a pithy post on a gaming forum, BUT

The study and process of extracting information out of data is commonly known as data mining.

I think you're investing your faculties in the wrong place, for example if you don't have a sound basis for what ETL is and what basic data modeling (cleansing, manipulation) looks like, there's no point in reading up on Big Data. Start with small-scale BI stuff, Cognos, Power BI, etc. Branch out into R. Then find your way towards Hadoop / Big Data.

Ok and thanks for answering my queries, just one thing, starting from 0, just name the first 3/4 specific (technical, or maybe conceptual) things I should learn in sequence to pursue this field. This will solve all my problems.

sunny945 · Feb 7, 2017

rafay22 said:
@sunny945 thanks a ton for taking so much time. So if I want to get started in this field, i should take it from Data Mining (from coursera?) Rather than Data Science (was doing it from bigdatauniversity.com) as it was less technical and basic, it got me frustrating.

Sent from my 831C using Tapatalk

no problem
yeah start from Basic data mining. there are many free courses online to get started in data mining. basically i recommend to start with data mining because it will start from from beginning of Data and why we need to analyse the data. a basic course from udemy looks like this :
Data Mining | Udemy

there are many courses on youtube too on this topic.

or starts with this for an example:
http://khabib.staff.ugm.ac.id/downloads/lecture/IntrotoDW.pdf
this was the first lecture when i was studying Data mining in MCS.this will give you a concept about data warehousing you and some very important and basic things in this field

rafay22 · Feb 7, 2017

sunny945 said:
no problem
yeah start from Basic data mining. there are many free courses online to get started in data mining. basically i recommend to start with data mining because it will start from from beginning of Data and why we need to analyse the data. a basic course from udemy looks like this :
Data Mining | Udemy

there are many courses on youtube too on this topic.

or starts with this for an example:
http://khabib.staff.ugm.ac.id/downloads/lecture/IntrotoDW.pdf
this was the first lecture when i was studying Data mining in MCS.this will give you a concept about data warehousing you and some very important and basic things in this field

Are you currently doing Bachelors in CS..? Do you have any subject related to this or just personal interest?

Sent from my 831C using Tapatalk

sunny945 · Feb 7, 2017

rafay22 said:
Are you currently doing Bachelors in CS..? Do you have any subject related to this or just personal interest?

Sent from my 831C using Tapatalk

I am Doing MS.CS and Data Mining specifially Text Mining is my field of research

shahbakht · Feb 7, 2017

I was at the same crossroads, not knowing where to start. I mistakenly enrolled in Big Data Specialisation from University of San Diego on Coursera. I did 5 out of 6 courses, barely scraping by, using forums and help from people. I grasped concepts of Hadoop and other big data functionalities, but because I had no prior foundation, I thought that it was a little misguided.

Then I took the Programming for Everybody Specialisation from University of Michigan, again at Coursera, that taught Python. I completed it, got basic understanding of Python, but not specifically for Data Mining. Again, learned a lot, but maybe was a little misguided effort.

I was willing to work hard but it was all unstructured. I knew about Big Data, Hadoop, Hive, etc, but I could not apply these concepts because I had studied these things haphazardly, a specific goal in mind but no clear roadmap.

Then I found out about Udacity's Nanodegrees which seem a pretty great idea, though they cost a bunch of money.

I am currently enrolled in Learning To Program from University of Toronto, at Coursera, which is about programming basics, and after that I have plans to enrol in the following two nanodegrees:

Data Analyst Nanodegree | Udacity

Intro to Programming Nanodegree | Udacity

Hopefully they'll bring structure to my learning.

P.S. good read, this article: If you want to learn Data Science, start with one of these programming classes

Good luck!

r3aper · Feb 8, 2017

Before enrolling in courses, please do read up books and articles published by people who are in the trade. That gives you enough idea of the scope of the field and the avenues which the tools of the trade can open up.

I am taking a slower approach towards things as I personally I think most of the programming/ground level work will be taken up by the machines themselves (this is what scares me the most about tech). I am looking more towards the managerial aspect of it, ofcourse you would need to have at least intermediate level understanding of the tools so you should know about all the back end stuff as well. But at present I have no plans to go deep into the programming aspect data science, just will stick with R programming for a while and other basics till I see a need to improve further for which I am willing to put all my time.

I am currently reading this book written by Nate Silver - The Noise in the signal, he's the guy who predicted 2008 & 2012 elections in the US with over 95% accuracy (actually his predictions were 99% accurate) its a good book and provides good insight.

Any Data Scientists or data analysts here?

Lazy guy :s

Seasoned

Active member

W'Sup, G?

Seasoned

{When in Doubt... // it out}

W'Sup, G?

. . :::::TEH UNBOXER::::: . .

Seasoned

Expert

. . :::::TEH UNBOXER::::: . .

Expert

. . :::::TEH UNBOXER::::: . .

Seasoned

. . :::::TEH UNBOXER::::: . .

Expert

. . :::::TEH UNBOXER::::: . .

Expert

Well-known member

W'Sup, G?